Rsyslog
rsyslog is the default Debian logging daemon and what's deployed fleet-wide at Wikimedia Foundation.
Packaging
We currently have a set of different rsyslog versions/packages that we manage for different reasons, all using gbp build flow:
- Rebuilds of the debian buster upstream packages (
8.1901.0
) including themmkubernetes
plugin our Kubernetes/Logging pipeline is build on - A backport of rsyslog
8.2008.0
to address issues with the debian upstream version (task T259780, task T199406) which is used on centrallog hosts
Branches
- debian/buster-wikimedia-k8s: Published to component/rsyslog-k8s for buster-wikimedia; Used on Kubernetes nodes running buster.
- debian/bullseye-wikimedia-k8s: Published to component/rsyslog-k8s for bullseye-wikimedia; Used on Kubernetes nodes running bullseye.
- debian/stretch-wikimedia: Published to main for stretch-wikimedia; Used on Kubernetes nodes running stretch.
- UNKNOWN: Published to component/rsyslog for buster-wikimedia; Used on centrallog nodes.
Build
# Adapt the --branch argument to debian/buster-wikimedia-k8s in case you want to build that
BACKPORTS=yes DIST=stretch gbp buildpackage --git-pbuilder --git-no-pbuilder-autoconf --git-dist=$DIST -sa -uc -us --git-debian --branch=debian/$DIST-wikimedia
Troubleshooting
rsyslog "stuck"
Servers to look for:
Puppet: syslog::centralserver Currently: centrallog1001.eqiad.wmnet and centrallog2001.codfw.wmnet (Aug 2020)
rsyslog has been observed for getting stuck from time to time (its TLS listener stops responding). In these situations a restart "fixes" the problem, however before doing a restart it is important to capture the daemon's status:
cd timeout 30s strace -f -p $(pidof rsyslogd) -s 65535 -o rsyslog_$(date -Im).strace lsof -p $(pidof rsyslogd) > rsyslog_$(date -Im).lsof gdb -p $(pidof rsyslogd) --batch -ex gcore gdb -p $(pidof rsyslogd) --batch -ex 'thread apply all bt full' > rsyslog_$(date -Im).threaddump systemctl restart rsyslog