User:Cwhite/Logstash/ECS Schema Guide for Developers

From Wikitech

Rationale

Oftentimes, software is opinionated about what constitutes a log entry. Since WMF's centralized logging infrastructure became generally available, it has experienced incredible organic growth. This growth presents challenges in the storage, ingest, and presentation domains. One such issue is there is no definition to how many fields can be set and subsequentially no typing info provided. Without control on the type of these fields, Elasticsearch must guess the type making type collisions a regular occurrence. Without control of what fields are available, fields remain largely undefined and meaningless to the outside observer. As we strive to boost signal, reduce noise, scale, simplify, and improve the user experience of the centralized logging system, we see the need to agree on a Common Logging Schema. The Observability team has evaluated options and decided to adopt the Elastic Common Schema (ECS).

Required Fields

ECS Version

ECS logs are identified by including the ECS version in the structured log event. This field is ecs.version and should contain the ECS version the log event is targeting.

Common Fields

The structured log object (a JSON object) consists of a set of attributes. There are a few common attributes[1] that most every log source will want to populate. When possible, please follow the field content recommendations in this document.

Timestamp

Ideally, the timestamp attribute contains an ISO-8601 formatted timestamp indicating the time the log was generated in UTC. This field will be translated to the native date type and moved to @timestamp.[2]

If not provided, the logging pipeline will generate the @timestamp field indicating the time it was received by the logging pipeline.

Message

message is a short summary or message optimized for viewing in a log viewer.[3] When a message is not provided, it can be constructed from other fields to provide a human-readable summary of the log entry.

The message field is often times the first field a user will look to when searching for diagnostic information. While there are no restrictions about what data is allowed in the message field, we recommend optimizing the field for human consumption by keeping the message short and putting diagnostic data in the proper place.[4]

How to tell if a piece of information is diagnostic data and not a good fit for the message field:

  1. Would this information be glossed over when a user reads the message?
  2. Is the piece of information useful for measurement?
  3. Is the piece of information useful to correlate with other log entries?
  4. Would it take multiple lines render the data in the message?

If the answer to any of the above questions is "yes," consider moving the datapoint(s) to their own field as defined in the ECS documentation or the label object.

Common datapoints with their own fields:

  • Event (UU)IDs: event.id field.
  • Stack traces: error.stack_trace field.
  • HTTP data: http object field.
  • URL data: url object field.
  • (... this list is incomplete)

Log Level[5]

The log.level field is a human-readable string and is indexed as a keyword. If log.level is omitted, the logging pipeline will attempt to populate it with:

  1. The value at log.syslog.severity.name.
  2. The human-readable definition of log.syslog.severity.code.
  3. NOTSET if no other level indicator could be found.[6]

For log producers that emit JSON-formatted messages and define their own level, log.level is used to populate log.syslog.severity.name and log.syslog.severity.code per this table:

Level to RFC5424 Mapping Table
Lowercase log.level RFC5424 definition Lowercase RFC5424 Severity RFC5424 Severity code PHP[7] Java[8] NodeJS[9] Python[10] Syslog[11]
trace, debug debug-level messages debug 7 Yes Yes Yes Yes Yes
info, informational informational messages informational 6 Yes Yes Yes Yes Yes
notice normal but significant condition notice 5 Yes Yes Yes Yes Yes
warning, warn warning conditions warning 4 Yes Yes Yes Yes Yes
error, err error conditions error 3 Yes Yes Yes Yes Yes
critical, crit critical conditions critical 2 Yes Yes Yes Yes Yes
alert action must be taken immediately alert 1 Yes Yes Yes Yes Yes
emerg, emergency, fatal system is unusable emergency 0 Yes Yes Yes Yes Yes

If log.level cannot be mapped to RFC5424 severity, then syslog.severity.name will be set to "alert" and syslog.severity.code will be set to "1".

Service Name[12]

service.name is a combination of service and cluster. The intent for this field is to indicate not just the service that emitted the log entry, but also indicate what cluster in the overall system the log came from.

  • For Kubernetes: this is the namespace name.
  • For all others: this is usually the application name and cluster concatenated with a hyphen (-).

Examples:

  • elasticsearch-logging
  • blazegraph-wdqs
  • elasticsearch-wdqs
  • mediawiki-api_appserver
  • mediawiki-jobrunner
  • memcached-memcached_gutter
  • memcached-memcached
It is important to have a meaningful and clear cluster names to avoid confusion around the concatenated service name and cluster.

Service Type[12]

service.type is the application name.

  • For Kubernetes: this is the app label.
  • For all others: this is the application name.

Examples:

  • elasticsearch
  • kafka
  • blazegraph
  • mediawiki
  • restbase

Diagnostic Data

Oftentimes, one will need diagnostic data to accompany the log entry. Diagnostic data gives the log entry context, more detail, and sometimes a path to reproduction. ECS defines fields to provide for the need for diagnostic data.

Hostname

host.name and respective fields in the host object.

Url Object

See URL object docs.

HTTP Object

See HTTP object docs.

Custom Fields

ECS defines the labels field for custom key-value data.

The labels field does not support nested objects. All keys and values are stored as keyword.

Deprecated Fields

These fields are commonly used, but have no clear analogue in ECS.

Channel

Use log.logger, event.module, or a custom label in the labels object.

Type

Use service.type and/or service.name.

Program

Use service.type and/or service.name.

Missing Fields

HTTP Headers

As of this writing (1.6.0), there is no great place for HTTP headers. (See this PR).

Notes

  1. The terms "attribute" and "field" are used interchangeably.
  2. Presence of the timestamp field (without the @) in Kibana indicates a problem in the logging pipeline and must be rectified.
  3. In Kibana, 180 characters shows comfortably on one line on a 1920x1080 widescreen monitor.
  4. The message field is analyzed as a natural language text type. This means that the message field will be:
    1. tokenized -- the text is broken up on whitespace, stop words, and optionally non-letter characters
    2. filtered -- the tokens are downcased and stemmed. Based on a set of rules, the base word is extracted from the word. For example, "running" is stemmed to "run" and "browsers" is stemmed to "browser".
    3. indexed -- the filtered tokens are then indexed into an inverted index indicating which documents the token can be found in.
  5. The WMF uses a number of programming languages in production. Each programming language has its own opinion on how to indicate logging level. Logging level can be customized by the developer further complicating the issue of finding errors. We see the need to agree on a defined set of log levels to make it easier for log consumers not always familiar with the programming language or developer preferences to find what they need. The Observability team has decided to standardize on RFC5424 Syslog severity.
  6. NOTSET indicates a problem either in the log producer or the pipeline and must be rectified.
  7. https://www.php-fig.org/psr/psr-3/
  8. https://en.wikipedia.org/wiki/Log4j#Log4j_log_levels
  9. https://github.com/trentm/node-bunyan#levels
  10. https://docs.python.org/3/library/logging.html#levels
  11. https://tools.ietf.org/html/rfc5424#section-6.2.1
  12. 12.0 12.1 In some cases, this field can be generated by the pipeline.