Jump to content

Portal:Toolforge/Admin/Logging

From Wikitech
This page is currently a draft.
Material may not yet be complete, information may presently be omitted, and certain parts of the content may be subject to radical, rapid alteration. More information pertaining to this may be available on the talk page.

Logging on Toolforge, both for user workloads and for the infrastructure, is being moved to a setup based on Grafana Loki.

Overview

Storage

Log storage is handled by Grafana Loki, with persistant storage in the Ceph cluster via the S3 interface. The s3 buckets exist in separate projects, tools-logging and toolsbeta-logging, as our RadosGW implementation does not allow for more specific than per-project access control restrictions. The buckets are created via tofu-provisioning system.

There are two Loki deployments in each project (tools and toolsbeta):

tools
Log storage for tool workloads. (So everything in an individual tool- namespace.)
infrastructure
Log storage for Toolforge infrastructure. This includes all non-tool- namespaces except ingress-nginx-gen2.

Both instances are installed in the loki Kubernetes namespace.

Collection

Each Kubernetes worker node runs a Grafana Alloy pod that forwards logs from pods running on that node to the appropriate Loki instance.

Deployment

The entire logging stack is deployed via the logging component of toolforge-deploy.

Operations

Upgrading Loki or Alloy

Monitoring

A Grafana dashboard is available.

See also