Skip to content

Monitoring

Monitoring relies on Prometheus 🡕 (metrics), Grafana 🡕 (dashboards) and Loki 🡕 (logs). It is organized by zone: one host collects, the others expose their metrics.

Diagram
RoleWhoWhat
CollectorHost with prometheus (+ monitoring, + loki)Metrics & alerts, dashboards, logs
Monitored nodeHosts tagged monitoring-nodenode-exporter (system metrics)
Log sourceHosts behind CaddyAlloy pushes access logs

prometheus (metrics + alerts) and monitoring (Grafana) are two separate services on the same host; Grafana requires Prometheus. Details are in Monitoring & Alerts.

Monitoring is a service. On the collector:

etc/config.yaml
services:
prometheus: # metrics + alerts (required by monitoring)
monitoring:
domain: "stats" # → stats.<zone>.domain.tld
  • Nodes to monitor receive the monitoring-node tag from their zone.
  • loki follows monitoring: Caddy logs are collected automatically as soon as Caddy and monitoring are active.
OptionEffectDefault
prometheus.retentionTimePrometheus metrics retention30d
monitoring.isNodeMakes the host a monitored nodefollows the tag
monitoring.kioskTargetDashboard displayed at the rootnode-exporter