Table of Contents
Tracing API consist of a few main classes:
Traceris used for all operations. See Tracer section.Spanis a mutable object storing information about the current operation execution. See Span section.
While languages and platforms have different ways of representing data, this section defines some generic requirements for this API.
OpenTelemetry can operate on time values up to nanosecond (ns) precision. The representation of those values is language specific.
A timestamp is the time elapsed since the Unix epoch.
- The minimal precision is milliseconds.
- The maximal precision is nanoseconds.
A duration is the elapsed time between two events.
- The minimal precision is milliseconds.
- The maximal precision is nanoseconds.
The OpenTelemetry library achieves in-process context propagation of Spans by
way of the Tracer.
The Tracer is responsible for tracking the currently active Span, and
exposes methods for creating and activating new Spans. The Tracer is
configured with Propagators which support transferring span context across
process boundaries.
New Tracer instances can be created via a TracerFactory and its getTracer
method. This method expects two string arguments:
TracerFactorys are generally expected to be used as singletons. Implementations
SHOULD provide a single global default TracerFactory.
Some applications may use multiple TracerFactory instances, e.g. to provide
different settings (e.g. SpanProcessors) to each of those instances and -
in further consequence - to the Tracer instances created by them.
name(required): This name must identify the instrumentation library (also referred to as integration, e.g.io.opentelemetry.contrib.mongodb) and not the instrumented library. In case an invalid name (null or empty string) is specified, a working default Tracer implementation as a fallback is returned rather than returning null or throwing an exception. A library, implementing the OpenTelemetry API may also ignore this name and return a default instance for all calls, if it does not support "named" functionality (e.g. an implementation which is not even observability-related). A TracerFactory could also return a no-op Tracer here if application owners configure the SDK to suppress telemetry produced by this library.version(optional): Specifies the version of the instrumentation library (e.g.semver:1.0.0).
Implementations might require the user to specify configuration properties at
TracerFactory creation time, or rely on external configuration, e.g. when using the
provider pattern.
Runtimes that support multiple deployments or applications might need to
provide a different TracerFactory instance to each deployment. To support this,
the global TracerFactory registry may delegate calls to create new instances of
TracerFactory to a separate Provider component, and the runtime may include
its own Provider implementation which returns a different TracerFactory for
each deployment.
Provider instances are registered with the API via some language-specific
mechanism, for instance the ServiceLoader class in Java.
The Tracer MUST provide methods to:
- Get the currently active
Span - Create a new
Span - Make a given
Spanas active
The Tracer SHOULD allow end users to configure other tracing components that
control how Spans are passed across process boundaries, including the binary
and text format Propagators used to serialize Spans created by the
Tracer.
When getting the current span, the Tracer MUST return a placeholder Span
with an invalid SpanContext if there is no currently active Span.
When creating a new Span, the Tracer MUST allow the caller to specify the
new Span's parent in the form of a Span or SpanContext. The Tracer
SHOULD create each new Span as a child of its active Span unless an
explicit parent is provided or the option to create a span without a parent is
selected, or the current active Span is invalid.
The Tracer MUST provide a way to update its active Span, and MAY provide
convenience methods to manage a Span's lifetime and the scope in which a
Span is active. When an active Span is made inactive, the previously-active
Span SHOULD be made active. A Span maybe finished (i.e. have a non-null end
time) but stil active. A Span may be active on one thread after it has been
made inactive on another.
The implementation MUST provide no-op binary and text Propagators, which the
Tracer SHOULD use by default if other propagators are not configured. SDKs
SHOULD use the W3C HTTP Trace Context as the default text format. For more
details, see trace-context.
A SpanContext represents the portion of a Span which must be serialized and
propagated along side of a distributed context. SpanContexts are immutable.
SpanContext MUST be a final (sealed) class.
The OpenTelemetry SpanContext representation conforms to the w3c TraceContext
specification. It contains two
identifiers - a TraceId and a SpanId - along with a set of common
TraceFlags and system-specific TraceState values.
TraceId A valid trace identifier is a 16-byte array with at least one
non-zero byte.
SpanId A valid span identifier is an 8-byte array with at least one non-zero
byte.
TraceFlags contain details about the trace. Unlike Tracestate values,
TraceFlags are present in all traces. Currently, the only TraceFlags is a
boolean sampled
flag.
Tracestate carries system-specific configuration data, represented as a list
of key-value pairs. TraceState allows multiple tracing systems to participate in
the same trace.
IsValid is a boolean flag which returns true if the SpanContext has a non-zero
TraceID and a non-zero SpanID.
IsRemote is a boolean flag which returns true if the SpanContext was propagated
from a remote parent.
Please review the W3C specification for details on the Tracestate field.
A Span represents a single operation within a trace. Spans can be nested to
form a trace tree. Each trace contains a root span, which typically describes
the end-to-end latency and, optionally, one or more sub-spans for its
sub-operations.
Spans encapsulate:
- The span name
- An immutable
SpanContextthat uniquely identifies theSpan - A parent span in the form of a
Span,SpanContext, or null - A start timestamp
- An end timestamp
- An ordered mapping of
Attributes - A list of
Links to otherSpans - A list of timestamped
Events - A
Status.
The span name is a human-readable string which concisely identifies the work represented by the Span, for example, an RPC method name, a function name, or the name of a subtask or stage within a larger computation. The span name should be the most general string that identifies a (statistically) interesting class of Spans, rather than individual Span instances. That is, "get_user" is a reasonable name, while "get_user/314159", where "314159" is a user ID, is not a good name due to its high cardinality.
For example, here are potential span names for an endpoint that gets a hypothetical account information:
| Span Name | Guidance |
|---|---|
get |
Too general |
get_account/42 |
Too specific |
get_account |
Good, and account_id=42 would make a nice Span attribute |
get_account/{accountId} |
Also good (using the "HTTP route") |
The Span's start and end timestamps reflect the elapsed real time of the
operation. A Span's start time SHOULD be set to the current time on span
creation. After the Span is created, it SHOULD be possible to
change the its name, set its Attributes, and add Links and Events. These
MUST NOT be changed after the Span's end time has been set.
Spans are not meant to be used to propagate information within a process. To
prevent misuse, implementations SHOULD NOT provide access to a Span's
attributes besides its SpanContext.
Vendors may implement the Span interface to effect vendor-specific logic.
However, alternative implementations MUST NOT allow callers to create Spans
directly. All Spans MUST be created via a Tracer.
Implementations MUST provide a way to create Spans via a Tracer. By default,
the currently active Span is set as the new Span's parent. The Tracer
MAY provide other default options for newly created Spans.
Span creation MUST NOT set the newly created Span as the currently
active Span by default, but this functionality MAY be offered additionally
as a separate operation.
The API MUST accept the following parameters:
-
The operation name. This is a required parameter.
-
The parent Span or parent Span context, and whether the new
Spanshould be a rootSpan. API MAY also have an option for implicit parent context extraction from the current context as a default behavior. -
SpanKind, default toSpanKind.Internalif not specified. -
Attributes - A collection of key-value pairs, with the same semantics as the ones settable with Span::SetAttributes. Additionally, these attributes may be used to make a sampling decision as noted in sampling description. An empty collection will be assumed if not specified.Whenever possible, users SHOULD set any already known attributes at span creation instead of calling
SetAttributelater. -
Links - see API definition here. Empty list will be assumed if not specified. -
Start timestamp, default to current time. This argument SHOULD only be set when span creation time has already passed. If API is called at a moment of a Span logical start, API user MUST not explicitly set this argument.
Each span has zero or one parent span and zero or more child spans, which
represent causally related operations. A tree of related spans comprises a
trace. A span is said to be a root span if it does not have a parent. Each
trace includes a single root span, which is the shared ancestor of all other
spans in the trace. Implementations MUST provide an option to create a Span as
a root span, and MUST generate a new TraceId for each root span created.
For a Span with a parent, the TraceId MUST be the same as the parent.
Also, the child span MUST inherit all TraceState values of its parent by default.
A Span is said to have a remote parent if it is the child of a Span
created in another process. Each propagators' deserialization must set
IsRemote to true on a parent SpanContext so Span creation knows if the
parent is remote.
During the Span creation user MUST have the ability to record links to other Spans. Linked
Spans can be from the same or a different trace. See Links
description.
A Link is defined by the following properties:
- (Required)
SpanContextof theSpanto link to. - (Optional) One or more
Attribute.
The Link SHOULD be an immutable type.
The Span creation API should provide:
- An API to record a single
Linkwhere theLinkproperties are passed as arguments. This MAY be calledAddLink. - An API to record a single
Linkwhose attributes or attribute values are lazily constructed, with the intention of avoiding unnecessary work if a link is unused. If the language supports overloads then this SHOULD be calledAddLinkotherwiseAddLazyLinkMAY be considered. In some languages, it might be easier to deferLinkor attribute creation entirely by providing a wrapping class or function that returns aLinkor formatted attributes. When providing a wrapping class or function it SHOULD be namedLinkFormatter.
Links SHOULD preserve the order in which they're set.
With the exception of the method to retrieve the Span's SpanContext and
recording status, none of the below may be called after the Span is finished.
The Span interface MUST provide:
- An API that returns the
SpanContextfor the givenSpan. The returned value may be used even after theSpanis finished. The returned value MUST be the same for the entire Span lifetime. This MAY be calledGetContext.
Returns true if this Span is recording information like events with the
AddEvent operation, attributes using SetAttributes, status with SetStatus,
etc.
There should be no parameter.
This flag SHOULD be used to avoid expensive computations of a Span attributes or
events in case when a Span is definitely not recorded. Note that any child
span's recording is determined independently from the value of this flag
(typically based on the sampled flag of a TraceFlag on
SpanContext).
This flag may be true despite the entire trace being sampled out. This
allows to record and process information about the individual Span without
sending it to the backend. An example of this scenario may be recording and
processing of all incoming requests for the processing and building of
SLA/SLO latency charts while sending only a subset - sampled spans - to the
backend. See also the sampling section of SDK design.
Users of the API should only access the IsRecording property when
instrumenting code and never access SampledFlag unless used in context
propagators.
A Span MUST have the ability to set attributes associated with it.
An Attribute is defined by the following properties:
- (Required) The attribute key, which must be a string.
- (Required) The attribute value, which is either:
- A primitive type: string, boolean or numeric.
- An array of primitive type values. The array MUST be homogeneous, i.e. it MUST NOT contain values of different types.
The Span interface MUST provide:
- An API to set a single
Attributewhere the attribute properties are passed as arguments. This MAY be calledSetAttribute. To avoid extra allocations some implementations may offer a separate API for each of the possible value types.
Attributes SHOULD preserve the order in which they're set. Setting an attribute with the same key as an existing attribute SHOULD overwrite the existing attribute's value.
Note that the OpenTelemetry project documents certain "standard attributes" that have prescribed semantic meanings.
A Span MUST have the ability to add events. Events have a time associated
with the moment when they are added to the Span.
An Event is defined by the following properties:
- (Required) Name of the event.
- (Optional) One or more
Attribute. - (Optional) Timestamp for the event.
The Event SHOULD be an immutable type.
The Span interface MUST provide:
- An API to record a single
Eventwhere theEventproperties are passed as arguments. This MAY be calledAddEvent. - An API to record a single
Eventwhose attributes or attribute values are lazily constructed, with the intention of avoiding unnecessary work if an event is unused. If the language supports overloads then this SHOULD be calledAddEventotherwiseAddLazyEventMAY be considered. In some languages, it might be easier to deferEventor attribute creation entirely by providing a wrapping class or function that returns anEventor formatted attributes. When providing a wrapping class or function it SHOULD be namedEventFormatter.
Events SHOULD preserve the order in which they're set. This will typically match the ordering of the events' timestamps.
Note that the OpenTelemetry project documents certain "standard event names and keys" which have prescribed semantic meanings.
Sets the Status of the Span. If used, this will override the
default Span status, which is OK.
Only the value of the last call will be recorded, and implementations are free to ignore previous calls.
The Span interface MUST provide:
- An API to set the
Statuswhere the new status is the only argument. This SHOULD be calledSetStatus.
Updates the Span name. Upon this update, any sampling behavior based on Span
name will depend on the implementation.
It is highly discouraged to update the name of a Span after its creation.
Span name is often used to group, filter and identify the logical groups of
spans. And often, filtering logic will be implemented before the Span creation
for performance reasons. Thus the name update may interfere with this logic.
The method name is called UpdateName to differentiate this method from the
regular property setter. It emphasizes that this operation signifies a
major change for a Span and may lead to re-calculation of sampling or
filtering decisions made previously depending on the implementation.
Alternatives for the name update may be late Span creation, when Span is
started with the explicit timestamp from the past at the moment where the final
Span name is known, or reporting a Span with the desired name as a child
Span.
Required parameters:
- The new operation name, which supersedes whatever was passed in when the
Spanwas started
Finish the Span. This call will take the current timestamp to set as Span's
end time. Implementations MUST ignore all subsequent calls to End (there might
be exceptions when Tracer is streaming event and has no mutable state associated
with the Span).
Call to End of a Span MUST not have any effects on child spans. Those may
still be running and can be ended later.
Parameters:
- (Optional) Timestamp to explicitly set the end timestamp
This API MUST be non-blocking.
Span lifetime represents the process of recording the start and the end timestamps to the Span object:
- The start time is recorded when the Span is created.
- The end time needs to be recorded when the operation is ended.
Start and end time as well as Event's timestamps MUST be recorded at a time of a calling of corresponding API.
Status interface represents the status of a finished Span. It's composed of
a canonical code in conjunction with an optional descriptive message.
StatusCanonicalCode represents the canonical set of status codes of a finished
Span, following the Standard GRPC
codes:
Ok- The operation completed successfully.
Cancelled- The operation was cancelled (typically by the caller).
Unknown- An unknown error.
InvalidArgument- Client specified an invalid argument. Note that this differs from
FailedPrecondition.InvalidArgumentindicates arguments that are problematic regardless of the state of the system.
- Client specified an invalid argument. Note that this differs from
DeadlineExceeded- Deadline expired before operation could complete. For operations that change the state of the system, this error may be returned even if the operation has completed successfully.
NotFound- Some requested entity (e.g., file or directory) was not found.
AlreadyExists- Some entity that we attempted to create (e.g., file or directory) already exists.
PermissionDenied- The caller does not have permission to execute the specified operation.
PermissionDeniedmust not be used if the caller cannot be identified (useUnauthenticated1instead for those errors).
- The caller does not have permission to execute the specified operation.
ResourceExhausted- Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space.
FailedPrecondition- Operation was rejected because the system is not in a state required for the operation's execution.
Aborted- The operation was aborted, typically due to a concurrency issue like sequencer check failures, transaction aborts, etc.
OutOfRange- Operation was attempted past the valid range. E.g., seeking or reading past end
of file. Unlike
InvalidArgument, this error indicates a problem that may be fixed if the system state changes.
- Operation was attempted past the valid range. E.g., seeking or reading past end
of file. Unlike
Unimplemented- Operation is not implemented or not supported/enabled in this service.
Internal- Internal errors. Means some invariants expected by underlying system has been broken.
Unavailable- The service is currently unavailable. This is a most likely a transient condition and may be corrected by retrying with a backoff.
DataLoss- Unrecoverable data loss or corruption.
Unauthenticated- The request does not have valid authentication credentials for the operation.
API MUST provide a way to create a new Status.
Required parameters
StatusCanonicalCodeof thisStatus.
Optional parameters
- Description of this
Status.
Returns the StatusCanonicalCode of this Status.
Returns the description of this Status.
Languages should follow their usual conventions on whether to return null or an empty string here if no description was given.
Returns true if the canonical code of this Status is Ok, otherwise false.
SpanKind describes the relationship between the Span, its parents,
and its children in a Trace. SpanKind describes two independent
properties that benefit tracing systems during analysis.
The first property described by SpanKind reflects whether the Span
is a remote child or parent. Spans with a remote parent are
interesting because they are sources of external load. Spans with a
remote child are interesting because they reflect a non-local system
dependency.
The second property described by SpanKind reflects whether a child
Span represents a synchronous call. When a child span is synchronous,
the parent is expected to wait for it to complete under ordinary
circumstances. It can be useful for tracing systems to know this
property, since synchronous Spans may contribute to the overall trace
latency. Asynchronous scenarios can be remote or local.
In order for SpanKind to be meaningful, callers should arrange that
a single Span does not serve more than one purpose. For example, a
server-side span should not be used directly as the parent of another
remote span. As a simple guideline, instrumentation should create a
new Span prior to extracting and serializing the span context for a
remote call.
These are the possible SpanKinds:
SERVERIndicates that the span covers server-side handling of a synchronous RPC or other remote request. This span is the child of a remoteCLIENTspan that was expected to wait for a response.CLIENTIndicates that the span describes a synchronous request to some remote service. This span is the parent of a remoteSERVERspan and waits for its response.PRODUCERIndicates that the span describes the parent of an asynchronous request. This parent span is expected to end before the corresponding childCONSUMERspan, possibly even before the child span starts. In messaging scenarios with batching, tracing individual messages requires a newPRODUCERspan per message to be created.CONSUMERIndicates that the span describes the child of an asynchronousPRODUCERrequest.INTERNALDefault value. Indicates that the span represents an internal operation within an application, as opposed to an operations with remote parents or children.
To summarize the interpretation of these kinds:
SpanKind |
Synchronous | Asynchronous | Remote Incoming | Remote Outgoing |
|---|---|---|---|---|
CLIENT |
yes | yes | ||
SERVER |
yes | yes | ||
PRODUCER |
yes | maybe | ||
CONSUMER |
yes | maybe | ||
INTERNAL |