From Wikitech
Jump to navigation Jump to search

We try to avoid application sending logs directly to logstash and instead have them pass through one of below logging supported interfaces.


Rsyslog has been selected as the standard host based "logging agent" for WMF prod infrastructure. This means that Rsyslog is the software responsible for ingesting log messages from applications, and outputting log messages to the Kafka fronted ELK stack.

Logs should not be sent directly from the application to logstash. Logs should always flow through rsyslog.

As of FYQ4 2019 all non-kafka logstash inputs have been deprecated, and work is under way to remove them.

Rsyslog provides many interfaces to support the varying logging capabilities of our applications.

UNIX Socket (/dev/log)

Standard syslog messages output to /dev/log may be routed to ELK (exclusively, or in addition to any locally configured log files) by adding the relevant program name to the /etc/rsyslog.lookup.d/lookup_table_output.json rsyslog lookup table. Here's a brief example which shows syslogs from program name "jenkins" being routed to both "local" (local files) and "kafka" (the kafka-logging ELK pipeline)

{ "version": "1",
  "nomatch" : "local",
  "type" : "string",
    {"index" : "jenkins", "value" : "kafka local" },

Both structured, and unstructured logs are supported. If the application supports it, structured json logs may be placed in the syslog msg field with the use of an @cee: cookie.

UDP listener

For applications which support output of UDP syslog local imudp listeners may be used to ingest log messages into rsyslog for processing and output to the Kafka logging ELK pipeline.

A few example services using this interface are:


A rsyslog UDP listener dubbed "udp-localhost-compat" runs on localhost:10514 on mw hosts. Mediawiki emits json structured logs using the @cee: cookie format to this endpoint for json parsing and output to the Kafka-logging ELK pipeline.

Network devices

A rsyslog UDP listener on exists on syslog.eqiad.wmnet and syslog.codfw.wmnet so that network devices may speak generic syslog to the syslog hosts, and have the log messages relayed to ELK via the Kafka logging pipeline.

Systemd Journal (stdout/stderr)

The journald interface is effectively an extension to the /dev/log input. Services running under systemd which output log messages to stdout/stderr will be picked up by journald. And then are forwarded to rsyslog via /dev/log.

Please see the above rsyslog lookup table for more information regarding how rsyslog decides whether or not to forward the log to the Kafka ELK logging pipeline.

If outputting JSON structured logs in within a syslog message, a "cookie" is required. Prepending "@cee: " before the JSON blob is sufficient.[1]

NOTE: Logs messages are broken between lines if they are longer than 2048 characters The fix in systemd is available in Debian Buster.

Python example implementation

from datetime import datetime
import json
import logging
import sys

class ECSFormatter(logging.Formatter):
    """ECS 1.7.0 logging formatter"""
    def format(self, record):
        ecs_message = {
            'ecs.version': '1.7.0',
            'log.level': record.levelname.upper(),
            'log.origin.file.line': record.lineno,
            'log.origin.file.name': record.filename,
            'log.origin.file.path': record.pathname,
            'log.origin.function': record.funcName,
            'message': str(record.msg),
            'process.name': record.processName,
            'process.thread.id': record.process,
            'process.thread.name': record.threadName,
            'timestamp': datetime.utcnow().isoformat(),
        if record.exc_info:
            ecs_message['error.stack_trace'] = self.formatException(record.exc_info)
        if not ecs_message.get('error.stack_trace') and record.exc_text:
            ecs_message['error.stack_trace'] = record.exc_text
        # Prefix "@cee" cookie indicating rsyslog should parse the message as JSON
        return "@cee: %s" % json.dumps(ecs_message)

def log_unhandled_exception(exc_type, exc_value, exc_traceback):
    """Forwards unhandled exceptions to log handler.  Override sys.excepthook to activate."""
        "Unhandled exception: %s" % exc_value, exc_info=(exc_type, exc_value, exc_traceback))

if __name__ == '__main__':
    sys.excepthook = log_unhandled_exception

    logger = logging.getLogger()
    logHandler = logging.StreamHandler()
    formatter = ECSFormatter()

    logging.info('It\'s log!')

Configuring rsyslog to forward your logs

Rsyslog can forward logs from the journal to Logstash:

Step 1: (If needed) Your application may need to set the SyslogIdentifier option under the Service heading in the systemd unit file. This is especially true for applications that run under a common runtime like Python or Java.


In the above example, the program field in logstash would be appname.

Step 2: The application (per SyslogIdentifier or unit name) must then be listed in the rsyslog lookup table. Ensure kafka is in the value. Example: gerrit:849169

Tailing Log Files

When an application does not have native support for syslog output or cannot write to the journal, we have the option to tail the application log files. This is useful for things like apache, which expect to write to multiple log files directly. It is not the preferred solution because log rotation must be handled separately.

Step 1: The rsyslog imfile input module is used to ingest log messages from files. Example: gerrit:848547 Via Puppet:

rsyslog::input::file { 'miscweb-apache2-error':
    path => '/var/log/apache2/*error*.log',

In the above example, the program field in logstash would be input-file-miscweb-apache2-error.

Step 2: The application must then be listed in the rsyslog lookup table. Ensure kafka is in the value. Example: gerrit:849169

  1. https://www.rsyslog.com/files/temp/doc-indent/configuration/modules/mmjsonparse.html