Rsyslog

From Wikitech

rsyslog is the default Debian logging daemon and what's deployed fleet-wide at Wikimedia Foundation.

Packaging

We currently have a set of different rsyslog versions/packages that we manage for different reasons, all using gbp build flow:

  • Rebuilds of the debian buster upstream packages (8.1901.0) including the mmkubernetes plugin our Kubernetes/Logging pipeline is build on
  • A backport of rsyslog 8.2008.0 to address issues with the debian upstream version (task T259780, task T199406) which is used on centrallog hosts

Branches

  • debian/buster-wikimedia-k8s: Published to component/rsyslog-k8s for buster-wikimedia; Used on Kubernetes nodes running buster.
  • debian/bullseye-wikimedia-k8s: Published to component/rsyslog-k8s for bullseye-wikimedia; Used on Kubernetes nodes running bullseye.
  • debian/stretch-wikimedia: Published to main for stretch-wikimedia; Used on Kubernetes nodes running stretch.
  • UNKNOWN: Published to component/rsyslog for buster-wikimedia; Used on centrallog nodes.

Build

# Adapt the --branch argument to debian/buster-wikimedia-k8s in case you want to build that
BACKPORTS=yes DIST=stretch gbp buildpackage --git-pbuilder --git-no-pbuilder-autoconf --git-dist=$DIST -sa -uc -us --git-debian --branch=debian/$DIST-wikimedia


Troubleshooting

rsyslog "stuck"

Servers to look for:

 Puppet: syslog::centralserver
 Currently: centrallog1001.eqiad.wmnet and centrallog2001.codfw.wmnet (Aug 2020)

rsyslog has been observed for getting stuck from time to time (its TLS listener stops responding). In these situations a restart "fixes" the problem, however before doing a restart it is important to capture the daemon's status:

  cd
  timeout 30s strace -f -p $(pidof rsyslogd) -s 65535 -o rsyslog_$(date -Im).strace
  lsof -p $(pidof rsyslogd) > rsyslog_$(date -Im).lsof
  gdb -p $(pidof rsyslogd) --batch -ex gcore
  gdb -p $(pidof rsyslogd) --batch -ex 'thread apply all bt full' > rsyslog_$(date -Im).threaddump
  systemctl restart rsyslog