Skip to content

Prometheus alerts

Maintenance map of the alert code: what goes where, and how to extend it without breaking anything. The admin-facing operation is described in Monitoring & Alerts.

The code is split in two to keep rule generation testable:

  • dnf/lib/alerts.nix : pure functions that produce Prometheus rules from the topology. Tested in dnf/tests/unit/lib/alerts_test.nix.
  • dnf/modules/service/prometheus.nix : impure wiring (Alertmanager, routing by severity, sops, Matrix bot, vhost, blackbox probes).

Any non-trivial logic goes into alerts.nix ; the module merely plugs it in.

Exposed via dnfLib (see dnf/lib/default.nix) :

HelperRole
serviceUnitsDNF service → systemd unit (e.g. idmkanidmd.service)
nodeClassNode class (critical / non-critical / disabled) : alert-* features then profile
severityForClassClass → severity (critical or warning)
hostExpectedUnitsExpected units for a host (based on its enabled services)
mkNodeRuleGroupsNodeDown, ServiceDown, SystemdUnitFailed
mkResourceRuleGroupsDisk, RAM, load, inodes, OOM (thresholds defaultThresholds)
mkNetworkRuleGroupsBlackbox probes (gateway, tailnet, DNS)
mkMaintenanceRuleGroupsMaintenance flag (silence during rebuild)
mergeRuleGroupsMerges fragments into a single document
mkAlertRuleGroupsShortcut : nodes + resources

So we merge everything via mergeRuleGroups, then emit only one entry :

dnf/modules/service/prometheus.nix
services.prometheus.rules = [
(builtins.toJSON (dnfLib.mergeRuleGroups (
[ (dnfLib.mkAlertRuleGroups { inherit nodes; /* … */ }) ]
++ lib.optional alerting.silenceOnRebuild (dnfLib.mkMaintenanceRuleGroups { /* … */ })
++ lib.optional alerting.network.enable (dnfLib.mkNetworkRuleGroups { /* … */ })
)))
];
I want to…I touch…
monitor a new serviceserviceUnits in alerts.nix
add a new rule familya mkXRuleGroups + add it to the module’s mergeRuleGroups
change a default thresholddefaultThresholds in alerts.nix
change a node’s classalert-*[:zone] feature (no code)
add a new destinationthe alertmanager block in prometheus.nix (receiver + route)