User:Cwhite/Logstash/ECS Schema Guide for Developers

Rationale

Oftentimes, software is opinionated about what constitutes a log entry. Since WMF's centralized logging infrastructure became generally available, it has experienced incredible organic growth. This growth presents challenges in the storage, ingest, and presentation domains. One such issue is there is no definition to how many fields can be set and subsequentially no typing info provided. Without control on the type of these fields, Elasticsearch must guess the type making type collisions a regular occurrence. Without control of what fields are available, fields remain largely undefined and meaningless to the outside observer. As we strive to boost signal, reduce noise, scale, simplify, and improve the user experience of the centralized logging system, we see the need to agree on a Common Logging Schema. The Observability team has evaluated options and decided to adopt the Elastic Common Schema (ECS).

Required Fields

ECS Version

ECS logs are identified by including the ECS version in the structured log event. This field is ecs.version and should contain the ECS version the log event is targeting.

Common Fields

The structured log object (a JSON object) consists of a set of attributes. There are a few common attributes^[1] that most every log source will want to populate. When possible, please follow the field content recommendations in this document.

Timestamp

Ideally, the timestamp attribute contains an ISO-8601 formatted timestamp indicating the time the log was generated in UTC. This field will be translated to the native date type and moved to @timestamp.^[2]

If not provided, the logging pipeline will generate the @timestamp field indicating the time it was received by the logging pipeline.

Message

message is a short summary or message optimized for viewing in a log viewer.^[3] When a message is not provided, it can be constructed from other fields to provide a human-readable summary of the log entry.

The message field is often times the first field a user will look to when searching for diagnostic information. While there are no restrictions about what data is allowed in the message field, we recommend optimizing the field for human consumption by keeping the message short and putting diagnostic data in the proper place.^[4]

How to tell if a piece of information is diagnostic data and not a good fit for the message field:

Would this information be glossed over when a user reads the message?
Is the piece of information useful for measurement?
Is the piece of information useful to correlate with other log entries?
Would it take multiple lines render the data in the message?

If the answer to any of the above questions is "yes," consider moving the datapoint(s) to their own field as defined in the ECS documentation or the label object.

Common datapoints with their own fields:

Event (UU)IDs: event.id field.
Stack traces: error.stack_trace field.
HTTP data: http object field.
URL data: url object field.
(... this list is incomplete)

Log Level^[5]

The log.level field is a human-readable string and is indexed as a keyword. If log.level is omitted, the logging pipeline will attempt to populate it with:

The value at log.syslog.severity.name.
The human-readable definition of log.syslog.severity.code.
NOTSET if no other level indicator could be found.^[6]

For log producers that emit JSON-formatted messages and define their own level, log.level is used to populate log.syslog.severity.name and log.syslog.severity.code per this table:

Level to RFC5424 Mapping Table
Lowercase `log.level`	RFC5424 definition	Lowercase RFC5424 Severity	RFC5424 Severity code
trace, debug	debug-level messages	debug	7
info, informational	informational messages	informational	6
notice	normal but significant condition	notice	5
warning, warn	warning conditions	warning	4
error, err	error conditions	error	3
critical, crit	critical conditions	critical	2
alert	action must be taken immediately	alert	1
emerg, emergency, fatal	system is unusable	emergency	0

If log.level cannot be mapped to RFC5424 severity, then syslog.severity.name will be set to "alert" and syslog.severity.code will be set to "1".

Service Name^[12]

service.name is a combination of service and cluster. The intent for this field is to indicate not just the service that emitted the log entry, but also indicate what cluster in the overall system the log came from.

For Kubernetes: this is the namespace name.
For all others: this is usually the application name and cluster concatenated with a hyphen (-).

Examples:

elasticsearch-logging
blazegraph-wdqs
elasticsearch-wdqs
mediawiki-api_appserver
mediawiki-jobrunner
memcached-memcached_gutter
memcached-memcached

It is important to have a meaningful and clear cluster names to avoid confusion around the concatenated service name and cluster.

Service Type^[12]

service.type is the application name.

For Kubernetes: this is the app label.
For all others: this is the application name.

Examples:

elasticsearch
kafka
blazegraph
mediawiki
restbase

Diagnostic Data

Oftentimes, one will need diagnostic data to accompany the log entry. Diagnostic data gives the log entry context, more detail, and sometimes a path to reproduction. ECS defines fields to provide for the need for diagnostic data.

Hostname

host.name and respective fields in the host object.

Url Object

See URL object docs.

HTTP Object

See HTTP object docs.

Custom Fields

ECS defines the labels field for custom key-value data.

The labels field does not support nested objects. All keys and values are stored as keyword.

Deprecated Fields

These fields are commonly used, but have no clear analogue in ECS.

Channel

Use log.logger, event.module, or a custom label in the labels object.

Type

Use service.type and/or service.name.

Program