Logstash/Interface

From Wikitech
Jump to navigation Jump to search

We try to avoid application sending logs directly to logstash and instead have them pass through one of below logging supported interfaces.


Rsyslog

Rsyslog has been selected as the standard host based "logging agent" for WMF prod infrastructure. This means that Rsyslog is the software responsible for ingesting log messages from applications, and outputting log messages to the Kafka fronted ELK stack.

Logs should not be sent directly from the application to logstash. Logs should always flow through rsyslog.

As of FYQ4 2019 all non-kafka logstash inputs have been deprecated, and work is under way to remove them.

Rsyslog provides many interfaces to support the varying logging capabilities of our applications.

UNIX Socket (/dev/log)

Standard syslog messages output to /dev/log may be routed to ELK (exclusively, or in addition to any locally configured log files) by adding the relevant program name to the /etc/rsyslog.lookup.d/lookup_table_output.json rsyslog lookup table. Here's a brief example which shows syslogs from program name "jenkins" being routed to both "local" (local files) and "kafka" (the kafka-logging ELK pipeline)

#/etc/rsyslog.lookup.d/lookup_table_output.json

{ "version": "1",
  "nomatch" : "local",
  "type" : "string",
  "table":[
    {"index" : "jenkins", "value" : "kafka local" },
  ]
}

Both structured, and unstructured logs are supported. If the application supports it, structured json logs may be placed in the syslog msg field with the use of an @cee: cookie.

UDP listener

For applications which support output of UDP syslog local imudp listeners may be used to ingest log messages into rsyslog for processing and output to the Kafka logging ELK pipeline.

A few example services using this interface are:

Mediawiki

A rsyslog UDP listener dubbed "udp-localhost-compat" runs on localhost:10514 on mw hosts. Mediawiki emits json structured logs using the @cee: cookie format to this endpoint for json parsing and output to the Kafka-logging ELK pipeline.

Network devices

A rsyslog UDP listener on 0.0.0.0:10514 exists on syslog.eqiad.wmnet and syslog.codfw.wmnet so that network devices may speak generic syslog to the syslog hosts, and have the log messages relayed to ELK via the Kafka logging pipeline.

Systemd Journal (stdout/stderr)

The journald interface is effectively an extension to the /dev/log input. Services running under systemd which output log messages to stdout/stderr will be picked up by journald. And then are forwarded to rsyslog via /dev/log.

Please see the above rsyslog lookup table for more information regarding how rsyslog decides whether or not to forward the log to the Kafka ELK logging pipeline.

If outputting JSON structured logs in within a syslog message, a "cookie" is required. Prepending "@cee: " before the JSON blob is sufficient.[1]

NOTE: Logs messages are broken between lines if they are longer than 2048 characters The fix in systemd is available in Debian Buster.

Python example implementation

logger_demo.py

import logging
import logging.config
from pythonjsonlogger import jsonlogger


class CustomJsonFormatter(jsonlogger.JsonFormatter):
    def add_fields(self, log_record, record, message_dict):
        super(CustomJsonFormatter, self).add_fields(log_record, record, message_dict)
        log_record['level'] = record.levelname.upper()


class StructuredLoggingHandler(logging.StreamHandler):
    def __init__(self, rsyslog=False):
        super(StructuredLoggingHandler, self).__init__()
        if rsyslog:
            prefix = '@cee: '
        else:
            prefix = ''
        self.formatter = CustomJsonFormatter(prefix=prefix)


# Demo code below
if __name__ == '__main__':
    logging.config.dictConfig({
        'version': 1,
        'disable_existing_loggers': False,
        'root': {
            'handlers': ['demo'],
            'level': 'DEBUG'
        },
        'handlers': {
            'demo': {
                'class': 'logger_demo.StructuredLoggingHandler',
                'rsyslog': True
            }
        }

    })
    logging.info('It\'s log!')

Tailing Log Files

When an application does not have native support for syslog output, and does not lend itself well to the stderr/stdout journald model, we have the option to tail the application log files. This is useful for things like apache, which expect to write to multiple log files directly.

The rsyslog imfile input module is used to ingest log messages from files. For each input file (or glob) configured, a unique "program name" is assigned following the convention "input-file-description". This program name is then added to the rsyslog lookup table in order to map the ingested log messages to the appropriate output.


Configuring rsyslog to forward your logs

Rsyslog needs to know that your logs should be forwarded to Kafka. There are two configuration items that must be in place.

Your application may need to set the SyslogIdentifier option under the Service heading in the systemd unit file. This is especially true for applications that run under a common runtime like Python or Java.

[Service]
SyslogIdentifier=appname

The application must also be listed in the rsyslog lookup table and configured to flag the log for sending to Kafka.


  1. https://www.rsyslog.com/files/temp/doc-indent/configuration/modules/mmjsonparse.html