Jump to content

Portal:Toolforge/Admin/Logs Service

From Wikitech

Documentation of components and common admin procedures for Logs Service, it's currently embedded as part of the jobs-cli (see Portal:Toolforge/Admin/Jobs Service).

Components

  • Logs API (source code): main entry-point for clients (users use the jobs cli)
  • Tools Loki (source code, under components/logging): Ingests and stores the pod logs for the logs api to retrieve later.



Alerts

List of alerts: https://prometheus.svc.toolforge.org/tools/alerts?search=logs

Runbooks: Category:LogsApiRunbooks.

Dashboards

https://grafana-rw.wmcloud.org/d/kcAb-KUSe/logs-service-overview

Main phabricator board

https://phabricator.wikimedia.org/project/board/539/

Administrative tasks

Starting a service

Logs API

This lives in kubernetes, behind the API gateway. To start it you can try redeploying it. To do so follow Portal:Toolforge/Admin/Kubernetes/Components#Deploy (the component is logs-api).

You can monitor if it's coming up with the usual k8s commands:

root@tools-k8s-control-9:~# kubectl get all -n logs-api
NAME                            READY   STATUS    RESTARTS        AGE
pod/logs-api-7b956f999f-k9htb   2/2     Running   1 (4d21h ago)   4d21h
pod/logs-api-7b956f999f-q6dws   2/2     Running   1 (4d21h ago)   4d21h

NAME               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/logs-api   ClusterIP   10.111.193.102   <none>        8443/TCP   4d21h

NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/logs-api   2/2     2            2           4d21h

NAME                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/logs-api-7b956f999f   2         2         2       4d21h

Tools loki log ingestion

Also a k8s component, follow Portal:Toolforge/Admin/Kubernetes/Components#Deploy (the component is logging).

root@tools-k8s-control-9:~# kubectl get all -n loki
NAME                                  READY   STATUS    RESTARTS   AGE
pod/loki-tools-backend-0              1/1     Running   0          43d
pod/loki-tools-backend-1              1/1     Running   0          43d
pod/loki-tools-backend-2              1/1     Running   0          43d
pod/loki-tools-read-78fcb9b8f-jtlqk   1/1     Running   0          43d
pod/loki-tools-read-78fcb9b8f-px9f7   1/1     Running   0          43d
pod/loki-tools-read-78fcb9b8f-tpdv5   1/1     Running   0          43d
pod/loki-tools-write-0                1/1     Running   0          43d
pod/loki-tools-write-1                1/1     Running   0          43d
pod/loki-tools-write-2                1/1     Running   0          43d

NAME                                           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
service/loki-tools-backend                     ClusterIP   10.106.142.0     <none>        3100/TCP,9095/TCP   111d
service/loki-tools-backend-headless            ClusterIP   None             <none>        3100/TCP,9095/TCP   111d
service/loki-tools-memberlist                  ClusterIP   None             <none>        7946/TCP            111d
service/loki-tools-query-scheduler-discovery   ClusterIP   None             <none>        3100/TCP,9095/TCP   111d
service/loki-tools-read                        ClusterIP   10.102.254.64    <none>        3100/TCP,9095/TCP   111d
service/loki-tools-read-headless               ClusterIP   None             <none>        3100/TCP,9095/TCP   111d
service/loki-tools-write                       ClusterIP   10.109.223.201   <none>        3100/TCP,9095/TCP   111d
service/loki-tools-write-headless              ClusterIP   None             <none>        3100/TCP,9095/TCP   111d

NAME                              READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/loki-tools-read   3/3     3            3           111d

NAME                                        DESIRED   CURRENT   READY   AGE
replicaset.apps/loki-tools-read-78fcb9b8f   3         3         3       111d

NAME                                  READY   AGE
statefulset.apps/loki-tools-backend   3/3     111d
statefulset.apps/loki-tools-write     3/3     111d

Note that there's also a daemonset (called alloy) that starts a pod on each kubernetes worker to gather the logs and send them to the loki service:

root@tools-k8s-control-9:~# kubectl get all -n alloy
NAME              READY   STATUS    RESTARTS   AGE
pod/alloy-25n87   1/1     Running   0          23h
pod/alloy-26dl4   1/1     Running   0          20h
...
pod/alloy-zn5v7   1/1     Running   0          23h
pod/alloy-zz9jc   1/1     Running   0          22h

NAME                   DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/alloy   74        74        74      74           74          <none>          111d

Stopping a service

Logs API

This is a simple deployment, you can just delete it and recreate it.

TBD: add commands

Tools loki log ingestion

Loki deployment

TBD: it's a complicated app with three different components, write, backend and read, read is a regular deployment so you can delete it and recreate it later, write and backend are stateful sets, you can try to delete them too.

Alloy

TBD: probably you can scale down the daemonset, or just delete/recreate it

Checking all components are alive

You can check the dashboard for a high level view.

Logs API

TBD

Tools loki log ingestion

TBD