Distributed tracing/Tutorial/Instrumenting your own application
This page provides examples of how to instrument your application code, focusing on MediaWiki and Node.js services.
Instrumenting MediaWiki code (core or extensions)
First, you'll need access to a TracerInterface object. Usually you would do this via access to a MediaWikiServices object, or you might do something in ServiceWiring.php -- if you aren't sure, ask aa MediaWiki expert.
Once you have a TracerInterface -- which might be a NoopTracer or a real Tracer, depending on if the request is being sampled for tracing -- the next thing you'll want to do is create a Span.
A Span is a span of time, with a start and an end timestamp, a name, perhaps some freeform key/value attributes, and also optionally a parent. Spans should correspond to some sort of operation you want to track: a database query, an RPC to another service, a computation that might last several milliseconds or more, etc. The name should be human-readable but concise, and not high in cardinality. Some more advice on this at the official OTel Tracing spec.
$span = $tracer->createSpan('Extension:Chart rendering');
This new span will automatically be attached to the most recently 'activated' span -- these are spans that represent synchronous, longer-running operations that are expected to produce many child spans. Examples of activated spans already in the codebase are holding PoolCounter locks, or running entry points, etc. Because entry points all create a span, you can almost always expect to have an activated parent span already available when writing MediaWiki code.
In this hypothetical case, we'll activate our new span, as it is a blocking operation that we expect to make database calls and call RPCs, and we'd like to associate those all with this span.
$span->start(); // time can be overridden, but defaults to now
$span->activate();
If you prefer, you can also use a fluent interface to set lots of data at once:
$span->setSpanKind(SpanInterface::SPAN_KIND_SERVER)
->setAttributes(['wiki' => $wiki, 'page' => $pageTitle, 'rev_id' => $revision])
->start()->activate();
Creating the span should happen right as you are about to begin the actual processing you're performing.
Since we've activated this span, any child spans created by other code will automatically be associated with it.
You can and should also append other attributes to your span as you perform processing. Another thing you can do is set the status of your span. Here's an example of annotating both in a span that encounters a critical exception that breaks the request:
catch (Exception $e) {
$span->setAttributes(['exception.message' => $e->message])
->setSpanStatus(SpanInterface::SPAN_STATUS_ERROR);
}
Errors are specially highlighted with a red warning icon in Jaeger, so they stand out, and they are also searchable.
Once your operation that you'd like to track is finished, you should both end() and deactivate() your span. MediaWiki will attempt to do this for you, but it's better to be explicit. It's usually best to do this in a finally {} block.
finally {
$span->end()->deactivate();
}
Instrumenting a Node.js service
The recommended approach for instrumenting a Node.js service at WMF is to adopt service-utils, which provides an x-request-id trace context propagator that integrates with OpenTelemetry Node SDK and is easily combined with the SDK's W3CTraceContextPropagator to provide W3C trace header propagation.
Node SDK initialization
Before you are able to instrument your application, you will first need to initialize the Node SDK.
Importantly, the SDK must be initialized before any module of your application code is imported. The most common technique to achieve this is to introduce an instrumentation script that is loaded before your main application script, using the --require (CJS) or --import (ESM) flags.
An alternative to modifying your node command-line flags
|
|---|
|
If there are constraints that make it challenging to change the command-line flags passed to |
A minimal (CJS) instrumentation script suitable for use in production at WMF might look like the following (using service-utils v2.0.0 or later, when the @wikimedia/service-utils/otel subpath export is available):
const { NodeSDK } = require( '@opentelemetry/sdk-node' );
const { getNodeAutoInstrumentations } = require( '@opentelemetry/auto-instrumentations-node' );
const { propagation } = require( '@opentelemetry/api' );
const { CompositePropagator, W3CTraceContextPropagator } = require( '@opentelemetry/core' );
const { XRequestPropagator } = require( '@wikimedia/service-utils/otel' );
propagation.setGlobalPropagator(
new CompositePropagator( {
propagators: [
new W3CTraceContextPropagator(),
new XRequestPropagator(),
],
} )
);
const sdk = new NodeSDK( {
instrumentations: [ getNodeAutoInstrumentations() ],
} );
sdk.start();
and, if saved as instrumentation.js, your node command-line arguments might change like so:
-node ./server.js
+node --require ./instrumentation.js ./server.js
In addition to creating and initializing the NodeSDK instance, and thus also ensuring that a TracerProvider is ready for use in the next section, this achieves two important goals:
- It configures trace context propagation in a way that complies with the requirements in Propagating trace context.
- It configures Node auto-instrumentation, which instruments key modules and packages (e.g., the Node
httpmodule) in a way that enables extraction and injection of trace context headers in order to support context propagation.
You can find more detail on the wide array of instrumentations added by #2 in the Node auto-instrumentation docs. This includes a number of packages in wide use at WMF (e.g., Express), so it's recommended to review the instrumentations relevant to your service to see what they offer.
An alternative to using getNodeAutoInstrumentations()
|
|---|
|
If the full set of instrumentations is not desirable for your use case, you can configure a minimal
|
Once SDK initialization is complete, you are ready to add instrumentation to your application code.
Adding Spans to your application
To create spans, you first need to acquire a Tracer. The simplest way to do this is via getTracer, which uses the global TracerProvider created during SDK initialization:
const { trace } = require( '@opentelemetry/api' );
const tracer = trace.getTracer( 'my-instrumentation-scope', 'my-instrumentation-version' );
The first argument is a unique identifier for some logical unit of application code (e.g., a module; see Instrumentation scope) and is required, while the version identifier is optional, but recommended in contexts where there is a natural notion of version.
With a Tracer, you are ready to create Spans that represent a span of time executing some logical operation. There are two ways to do this:
tracer.startSpanwill create a new span that is a child of the most recently activated (i.e., ongoing) span, if any, from the context, but will not modify the context (nor activate the span). Note that the most recently activated span may not have been started by you - e.g., when handling an HTTP request, it may be a span started by auto-instrumentation of thehttpmodule.tracer.startActiveSpanis similar, but will modify the context and activate the span, making any span created while it remains active its child.
Unless you are creating spans representing concurrent operations that share the same parent, startActiveSpan is generally what you want. This method has a number of overloads, but the most common involves passing a name identifying the operation that the span represents and a function implementing the operation, which is invoked with an (activated) Span instance. The return value of function is forwarded by startActiveSpan.
Importantly, your code must explicitly end a span when work is complete. In startActiveSpan, this happens within the function body implementing the operation. Taken together, a simple span life cycle might look like the following:
const result = tracer.startActiveSpan( 'my-operation', ( span ) => {
try {
// Perform an operation that may throw, return a result.
} finally {
span.end();
}
} );
Note that you do not need to explicitly deactivate the span - startActiveSpan manages this for you.
When working with a Span instance, you have the ability to annotate the span with attributes - i.e., key-value pairs that imbue the span with additional detail about the operation it represents. You can do this with the setAttribute or setAttributes methods, e.g.
tracer.startActiveSpan( 'my-operation', ( span ) => {
span.setAttributes( { 'my-attribute': 'my-attribute-value' } );
// Perform an operation ...
} );
Attributes are frequently known at span creation time, so instead of calls to setAttributes, you can use a startActiveSpan overload that supports an options object as its second argument containing an attributes property. Another common property is the span kind, provided via the kind property. For example:
const { SpanKind } = require( '@opentelemetry/api' );
tracer.startActiveSpan( 'my-operation', {
attributes: { 'my-attribute': 'my-attribute-value' },
kind: SpanKind.SERVER, // An operation executed while handling a client's request
}, ( span ) => {
// Perform an operation ...
} );
Finally, you can also imbue a span with a status representing the outcome of the operation, as well as exception information if relevant. Revisiting the earlier example:
const { SpanStatusCode } = require( '@opentelemetry/api' );
const result = tracer.startActiveSpan( 'my-operation', ( span ) => {
try {
// Perform an operation that may throw, return a result.
} catch ( e ) {
span.setStatus( { code: SpanStatusCode.ERROR } ); // There is also a string 'message' property.
if ( e instanceof Error ) {
span.recordException( e );
}
} finally {
span.end();
}
} );
While this only covers a very small subset of the Node SDK's tracing API, it should give a flavor of what is possible, equivalent to the discussion above for MediaWiki. See additional resources linked below for more detailed documentation.
Configuration
Refer to Configuring your application for guidance on how to emit trace data to the OpenTelemetry Collector.
OTEL_TRACES_EXPORTER environment variable to console in order to emit trace data to stdout, as a simpler alternative to running a local collector.
Additional resources
The OpenTelemetry documentation provides an excellent complement to the discussion above, particularly:
- The Node SDK getting-started guide
- The SDK instrumentation guide - The section on Tracing is helpful in demonstrating how to put core tracing concepts into practice.
- The SDK API documentation