Simple Centralized Logs Monitoring with Grafana Loki

April 21, 2025

สำหรับผู้ดูแลระบบ IT นั้น การจัดการเรื่อง Logs ของ IT System เป็นสิ่งสำคัญ พอมีเหตุ incident เข้ามาและต้องทำการ investigate ปัญหาเพื่อที่จะทำการ Troubleshoot แต่ละที ก็ต้องทำการเข้าถึงตัว Server เพื่อเข้าไปดูไฟล์ Logs ซึ่งบางทีต้องดูกันหลายเครื่อง และมีความซับซ้อน หากไม่มีระบบ Centralized Logs Monitoring ก็คงลำบากอยู่พอสมควร

แล้วเราจะเริ่มใช้งานระบบที่เป็น Centralized Logs Monitoring ได้อย่างไรได้บ้าง ใน Blog นี้จะมีตัวอย่างของการทำ Centralized Logs Monitoring แบบง่ายๆ กันครับ

Theory

นิยามของ Log files

ก่อนจะเข้าเรื่องการสร้างระบบ Centralized Logs Monitoring เรามารู้จักกับคำนิยามของ Log files กันก่อนครับ

Log files คือ ไฟล์ที่ถูกสร้างโดย Software ซึ่งเป็นข้อมูลของ operations, activities, รวมไปถึง patterns ของการใช้งาน application, sever หรือ IT system
โดย Log file นี้ประกอบไปด้วย historical record ในทุกๆ processes, events และข้อมูลที่อธิบายเพิ่มเติม เช่น timestamps เพื่อบ่งบอกเวลาที่เกิดขึ้น รวมไปถึงบริบทต่างๆ ของข้อมูล

ตัวอย่างประเภทของ Log files

Event logs

เป็น high-level log ที่ทำการจัดเก็บข้อมูลของ system activity มีวัตถุประสงค์เพื่อใช้สำหรับการ troubleshooting issues เช่น networks traffic logs, access logs, usage logs

System logs

เป็น records เกี่ยวกับ operating system event เช่น system changes, errors, warnings

Access logs

เป็น list ของทุกๆ request จาก users หรือ application ในการเข้าถึง system ซึ่งจะประกอบไปด้วย user authentication, user ทำการ request ไปที่ systems หรือ resources ใด

Server logs

เป็น log file ที่ server ทำการ create และ maintain โดยอัตโนมัติ เช่น client IP address, ประเภทของ request ที่เข้ามายัง server

ทำไมต้องมี Centralized logs

เหตุผลสำคัญก็คือ หากเราทำการเก็บ logs ที่ฝั่งของ resource เอง เช่น กรณีเราทำการจัดเก็บ log ไว้ในเครื่อง server ตัวเอง หากจะทำการเข้าถึง logs เราก็ต้องทำการเข้าถึง server นั้นๆ เพื่อดู logs ซึ่งปัญหาอาจจะยังไม่เกิดขึ้นถ้า server เรามีจำนวนไม่มาก เช่น 1-3 เครื่อง

แต่ในความเป็นจริงแล้วเราอาจจะมีระบบที่ดูแลมากกว่านั้น ยิ่งถ้าเรามีการพัฒนาระบบไปเรื่อยๆ จำนวน server ก็เพิ่มขึ้นตามปริมาณการใช้งาน และหากมีการใช้งาน container อีก ก็จะยิ่งเพิ่มระบบทวีคูณขึ้นไปอีก ทำให้เริ่มเกิดความลำบากในการ investigate ปัญหา จึงส่งผลต่อการจะทำ Troubleshooting ยากขึ้นไปอีก

ด้วยเหตุนี้จึงมีแนวคิดในการทำระบบ Centralized Logs Monitoring เกิดขึ้น ทำให้ผู้ดูแลระบบสามารถเข้าถึง logs ได้จากจุดเดียว และเหตุผลอีกส่วนหนึ่งคือ ในกรณีเครื่อง server down เรายังคงสามารถดู logs ได้จาก Centralized Logs Monitoring นั่นเอง

Grafana Loki Logs Stack

หลังจากที่เราได้รู้ถึงนิยาม, ตัวอย่างประเภทของ logs รวมถึงเหตุผลว่าทำไมต้องมี Centralized Logs Monitoring กันไปแล้ว ตอนนี้เรามาทำความรู้จักกับพระเอกของเราในวันนี้กัน นั่นคือ Grafana Loki

Grafana Loki

เป็น set ของ open-source components ที่ประกอบไปด้วย featured logging stack โดยจะมีการจัดเก็บ index แบบเล็กๆ และทำการ compressed เนื้อหาของ logs ให้เป็น chunks ทำให้การจัดการ logs ทำได้ง่าย รวมถึงใช้ cost น้อยด้วย

Featured เด่นๆ ของ Grafana Loki คือ จะทำการ indexing metadata ในรูปแบบของ log labels และทำการ compressed และจัดเก็บ log data ในรูปแบบ Chunks object ซึ่งสามารถนำไปเก็บไว้ใน Object stores เช่น Amazon Simple Storage Service (S3) หรือ Google Cloud Storage (GCS) ได้

Loki logging stack

ส่วนประกอบสำคัญของ Loki Stack นี้จะประกอบไปด้วย 3 ส่วน

Agent - เป็นส่วนที่ทำหน้าที่ในการ scrape logs และทำการเปลี่ยน logs ให้เป็น stream ด้วยการเพิ่ม labels เข้าไป จากนั้นทำการ push logs stream ไปยัง Loki ผ่าน HTTP API ซึ่งตัวอย่างของตัว Agent ได้แก่ Grafana Alloy, Promtail
Loki - เป็น main server ที่ทำหน้าที่ในการ ingesting และ storing logs และยังรวมไปถึงการทำ processing queries logs อีกด้วย
Grafana - เป็นส่วนที่ใช้งานในการ querying และ displaying log data

Loki architecture

Loki architecture

ตาม Architecture นี้จะแบ่งส่วนของการทำงานหลักๆ เป็น 2 Path ได้แก่

Read Path ตามรูปภาพจะเป็นเส้นสีเขียว จะเป็นขาที่ทำการ Query logs ขึ้นมา เพื่อดูเนื้อหาของ logs ตามที่เราได้ทำการจัดเก็บ โดยในขา read นี้เราสามารถจัดการทำระบบ Alertmanager ได้ด้วย ซึ่งตัว Alertmanager จะทำหน้าที่ในการอ่าน logs เป็นระยะ และหากเนื้อหาจาก logs เข้าเงื่อนไขที่เรากำหนดให้ส่ง Alert ก็จะมีการสร้าง Notification ไปยังระบบที่เราตั้งค่าไว้ เช่น ตั้งค่า Alert ด้วย Email
Write Path ตามรูปภาพจะเป็นเส้นที่ฟ้า จะเป็นขาที่ทำการเขียน logs ลงใน long-term storage ซึ่งตามรูปนี้เราจะทำจัดเก็บลงใน Minio ซึ่งเป็น Tool open-source ในการจัดเก็บ object storage

Practice

หลังจากที่เราได้เรียนรู้เนื้อหาในส่วนของ Theory กันไปแล้ว ต่อมาเราจะทำกัน Practice Lab กันครับ

สำหรับการ practice lab ครั้งนี้ เราจะใช้รูปแบบในการ deploy แบบ Simple Scalable Deployment ซึ่งจะทำการแยกส่วนของ read และ write ออกทั้งหมดเป็น microservices โดยเราจะใช้ Technology ของ container (Docker) ในการแยกส่วน microservices ของตัว Grafana Loki ครับ

⚠️

Lab นี้มีข้อจำกัดในการทำ High Availability ของตัว Grafana Loki server อาจจะไม่เหมาะกับระบบที่มีความ Critical สูง

แต่หากไม่ได้กังวลเรื่องระบบ Critical ใน Lab นี้ก็เหมาะสำหรับการนำไปศึกษาทดลองระบบ หรือระบบที่ Load น้อยๆ และไม่ได้มีความ Critical ในกรณี Server Logs นั้นล่มไปครับ

Overview

ใน Lab นี้จะทำการทดสอบส่ง Logs จาก Server Ubuntu ชื่อว่า Web01 ที่ลง service Nginx และ Db01 ที่ลง service Postgresql ไปยัง Grafana Loki ซึ่งทำตัวเป็น log aggregation system ในการเก็บ Logs ไปที่ศูนย์กลาง จากนั้นจะทำการ Query logs ผ่านหน้า UI ของ Grafana

Implement Centralized Logs Server

ใน Lab นี้เราจะใช้การ implement ระบบ Centralized Logs Monitoring โดยใช้ Docker ในการ deploy โดยตัว docker-compose นี้จะประยุกต์มาจาก GitHub repo ของ Grafana Loki โดยสามารถ Download source code ได้จากที่นี่

git clone [email protected]:nestpractice/observability-simpleloki.git
cd observability-simpleloki

ทำการ install docker engine และ docker compose บน Centralized Logs server ได้จาก document นี้ https://docs.docker.com/engine/install/
จากนั้นทำการ run คำสั่งนี้เพื่อทำการสร้าง Centralized log บน server
```
docker compose -f simple-loki/docker-compose.yaml up -d
```
ทำการ check components ดูว่าสามารถ run ได้ตามปกติไหม
```
docker ps
```
ไปที่ Web Browser แล้วเข้าถึงหน้า Grafana ดูว่าสามารถเข้าถึงได้ไหม และจะเห็นว่ามี datasource เป็น Grafana Loki ขึ้นมาตามรูป (โดย default จะเข้าผ่าน ip address และใช้ port 3000)

Implement Promtail Scrape Logs from Nginx server, Postgresql server

Nginx Server

ดำเนินการ Install Promtail ซึ่งเป็น Agent ในการดึง Log จาก server

mkdir -p /opt/promtail && cd /opt/promtail

sudo apt install -y unzip wget
wget https://github.com/grafana/loki/releases/latest/download/promtail-linux-amd64.zip
unzip promtail-linux-amd64.zip
chmod +x promtail-linux-amd64

sudo mv promtail-linux-amd64 /usr/local/bin/promtail

สร้าง Promtail Configuration file

mkdir -p /etc/promtail
vim /etc/promtail/promtail-config.yaml

/etc/promtail/promtail-config.yaml

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /var/log/positions.yaml

clients:
  - url: http://<GRAFANA_LOKI_URL>:8080/loki/api/v1/push
    tenant_id: servers

scrape_configs:
  - job_name: nginx-access
    static_configs:
      - targets:
          - localhost
        labels:
          job: nginx
          host: ${HOSTNAME}
          __path__: /var/log/nginx/access.log

  - job_name: nginx-error
    static_configs:
      - targets:
          - localhost
        labels:
          job: nginx
          host: ${HOSTNAME}
          __path__: /var/log/nginx/error.log

Set up systemd service เพื่อให้ตัว service Promtail สามารถ run เป็น daemon ใน server

vim /etc/systemd/system/promtail.service

/etc/systemd/system/promtail.service

[Unit]
Description=Promtail service
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/promtail -config.file=/etc/promtail/promtail-config.yaml
Restart=on-failure

[Install]
WantedBy=multi-user.target

Reload Systemd configure

sudo systemctl daemon-reexec
sudo systemctl daemon-reload
sudo systemctl enable --now promtail

Check log Nginx จากหน้า Grafana โดยเลือก Label เป็น job = nginx
หรือเข้าไปที่หน้า Drilldown เพื่อดู logs

Postgresql server

ทำการตั้งค่า Postgresql ให้ทำการพ่น Log ออกมาเป็น File เพื่อให้ Promtail agent เข้าไป scrape ตัว Log โดยแก้ไข configure ที่ไฟล์ /etc/postgresql/<version>/main/postgresql.conf
แล้วทำการ reload postgresql configure
```
sudo systemctl reload postgresql
```

Install Promtail

mkdir -p /opt/promtail && cd /opt/promtail

sudo apt install -y unzip wget
wget https://github.com/grafana/loki/releases/latest/download/promtail-linux-amd64.zip
unzip promtail-linux-amd64.zip
chmod +x promtail-linux-amd64

sudo mv promtail-linux-amd64 /usr/local/bin/promtail

สร้าง Promtail Configuration file

mkdir -p /etc/promtail
vim /etc/promtail/promtail-config.yaml

/etc/promtail/promtail-config.yaml

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /var/log/positions.yaml

clients:
  - url: http://<GRAFANA_LOKI_URL>:8080/loki/api/v1/push
    tenant_id: servers

scrape_configs:
  - job_name: postgresql-logs
    static_configs:
      - targets:
          - localhost
        labels:
          job: postgres
          host: ${HOSTNAME}
          __path__: /var/log/postgresql/postgresql-*.log

Set up systemd service เพื่อให้ตัว service Promtail สามารถ run เป็น daemon ใน server

vim /etc/systemd/system/promtail.service

/etc/systemd/system/promtail.service

[Unit]
Description=Promtail service
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/promtail -config.file=/etc/promtail/promtail-config.yaml
Restart=on-failure

[Install]
WantedBy=multi-user.target

Reload Systemd configure

sudo systemctl daemon-reexec
sudo systemctl daemon-reload
sudo systemctl enable --now promtail

ทำการ Check logs Postgresql ในหน้า Grafana โดยเลือก Label เป็น job = postgres
หรือเข้าไปที่หน้า Drilldown เพื่อดู logs

Summary

สำหรับเนื้อหาในบทความนี้ จะเป็นการสร้างระบบ Centralized Logs Monitoring อย่างง่าย ซึ่งอาจจะมีบางส่วนที่ยังไม่สมบูรณ์ เช่น

การจัดการ Label ในการจัดเก็บ logs เพื่อให้สามารถค้นหาและอ่าน log ได้ง่ายขึ้น
ความ High Availability ของตัว Grafana Loki server เนื่องจาก implement ระบบ Centralized Logs Monitoring ทุกอย่างอยู่ใน server เดียว อาจจะไม่เหมาะกับ Load สูงๆ ซึ่งถ้าหากเรามีระบบจำนวนมากที่ต้องมีการส่ง Logs ควรพิจารณาระบบที่สามารถ Scale ได้ง่าย เช่น การ implement ลงใน Kubernetes Cluster
รวมไปถึงการจัดการ Security ในฝั่งของตัว Grafana Loki เอง ซึ่งโดย default จะยังไม่มีการใช้งาน HTTPS และยังไม่มีการทำ Authentication ในการเข้าถึง โดยหากเป็นระบบ Network ภายในเองยังพอจัดการในเรื่องของ Traffic ได้ เช่น การ Allow เฉพาะ IP range ของ servers ที่ต้องการเก็บ Logs ในการเข้าถึง Grafana Loki เท่านั้น แต่อาจจะยังไม่เพียงพอในกรณีที่ server นั้นถูก compromised ไปแล้ว ซึ่งก็ต้องเป็นหน้าที่ของทีมผู้ดูแลระบบต้องทำการ implement ระบบ Authentication เพิ่มเติมต่อไปครับ (บทความการทำ Security Grafana Loki อ่านเพิ่มเติมในบทความนี้ครับ)

แต่ยังไงซะ การมีระบบ Centralized Logs Monitoring ก็ยัง ดีกว่าไม่มี นะครับ เพียงแต่เราอาจจะต้องประเมินความเสี่ยงไว้ด้วยว่า ระบบที่เราจะนำไป implement ด้วยควรเป็นระบบที่ไม่ Critical เช่น environment ของ Development ก่อน เพื่อทำการ Tunning ทั้งตัว Label เอง รวมถึงในส่วน Security ด้วย และเมื่อทำการ Tunning เสร็จทั้งหมดแล้ว ค่อยนำไป implement เข้ากับ Production ได้ครับ

References

Last updated on April 28, 2025