diff --git a/CHANGELOG.md b/CHANGELOG.md index 01380bca034..96a4e91b152 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,9 +1,8 @@ # Changelog -Please update changelog as part of any significant pull request. Place short -description of your change into "Unreleased" section. As part of release -process content of "Unreleased" section content will generate release notes for -the release. +Please update changelog as part of any significant pull request. +Place short description of your change into "Unreleased" section. +As part of release process content of "Unreleased" section content will generate release notes for the release. ## Unreleased diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 1f167e9e705..7381b17894b 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -2,14 +2,11 @@ Welcome to OpenTelemetry specifications repository! -Before you start - see OpenTelemetry general -[contributing](https://github.com/open-telemetry/community/blob/master/CONTRIBUTING.md) -requirements and recommendations. +Before you start - see OpenTelemetry general [contributing](https://github.com/open-telemetry/community/blob/master/CONTRIBUTING.md) requirements and recommendations. ## Sign the CLA -Before you can contribute, you will need to sign the [Contributor License -Agreement](https://identity.linuxfoundation.org/projects/cncf). +Before you can contribute, you will need to sign the [Contributor License Agreement](https://identity.linuxfoundation.org/projects/cncf). ## Proposing a change @@ -17,15 +14,10 @@ Significant changes should go through the [RFC process](https://github.com/open- ## Writing specs -Specification is written in markdown format. Please make sure files are rendered -OK on GitHub. +Specification is written in markdown format. +Please make sure files are rendered OK on GitHub. -Be sure to clearly define the specification requirements using the key words -defined in [BCP 14](https://tools.ietf.org/html/bcp14) -[[RFC2119](https://tools.ietf.org/html/rfc2119)] -[[RFC8174](https://tools.ietf.org/html/rfc8174)] while making sure to heed the -guidance laid out in [RFC2119](https://tools.ietf.org/html/rfc2119) about the -sparing use of imperatives: +Be sure to clearly define the specification requirements using the key words defined in [BCP 14](https://tools.ietf.org/html/bcp14) [[RFC2119](https://tools.ietf.org/html/rfc2119)] [[RFC8174](https://tools.ietf.org/html/rfc8174)] while making sure to heed the guidance laid out in [RFC2119](https://tools.ietf.org/html/rfc2119) about the sparing use of imperatives: > Imperatives of the type defined in this memo must be used with care > and sparingly. In particular, they MUST only be used where it is @@ -35,21 +27,13 @@ sparing use of imperatives: > on implementors where the method is not required for > interoperability. -It is important to build a specification that is clear and useful, not -one that is needlessly restrictive and complex. +It is important to build a specification that is clear and useful, not one that is needlessly restrictive and complex. ### Markdown style Markdown files should be properly formatted before a pull request is sent out. -In this repository we follow the -[markdownlint rules](https://github.com/DavidAnson/markdownlint#rules--aliases) -with some customizations. See [markdownlint](.markdownlint.yaml) or -[settings](.vscode/settings.json) for details. - -We highly encourage to use line breaks in markdown files at `80` characters -wide. There are tools that can do it for you effectively. Please submit proposal -to include your editor settings required to enable this behavior so the out of -the box settings for this repository will be consistent. +In this repository we follow the [markdownlint rules](https://github.com/DavidAnson/markdownlint#rules--aliases) with some customizations. +See [markdownlint](.markdownlint.yaml) or [settings](.vscode/settings.json) for details. To check for style violations, use @@ -59,11 +43,8 @@ gem install mdl mdl -c .mdlrc . ``` -To fix style violations, follow the -[instruction](https://github.com/DavidAnson/markdownlint#optionsresultversion) -with the Node version of markdownlint. If you are using Visual Studio Code, -you can also use the `fixAll` command of the -[vscode markdownlint extension](https://github.com/DavidAnson/vscode-markdownlint). +To fix style violations, follow the [instruction](https://github.com/DavidAnson/markdownlint#optionsresultversion) with the Node version of markdownlint. +If you are using Visual Studio Code, you can also use the `fixAll` command of the [vscode markdownlint extension](https://github.com/DavidAnson/vscode-markdownlint). ### Misspell check diff --git a/issue-management.md b/issue-management.md index 69144e9b42b..723c0db876f 100644 --- a/issue-management.md +++ b/issue-management.md @@ -1,28 +1,23 @@ # Issue Management for OpenTelemetry -It's an important community goal for OpenTelemetry that our members find the backlogs -to be responsive, and easy to take part in. Shared practices will simplify collaboration -and engagement as well as help standardize on automation and overall project management. +It's an important community goal for OpenTelemetry that our members find the backlogs to be responsive, and easy to take part in. +Shared practices will simplify collaboration and engagement as well as help standardize on automation and overall project management. -SIGs are encouraged to experiment with labels and backlog management procedures, -including project board. This document only covers the bare bones of issue management -which should work the same across all SIGs, to help maintain a responsive backlog and -allow us to track work across all projects in a similar manner. +SIGs are encouraged to experiment with labels and backlog management procedures, including project board. +This document only covers the bare bones of issue management which should work the same across all SIGs, to help maintain a responsive backlog and allow us to track work across all projects in a similar manner. ## Roles - OP: - Original Poster. This is the person who has opened or posted the issue. - Maintainer (aka Triager, or anyone performing that role): - - Person who is triaging the issue by determining its workability. This person is - responsible for getting the tickets to one of two stages - 1) `help-wanted` - 2) `will-not-fix`. They are responsible for triaging by working with the OP to get - additional information as needed and analyzing the issue and adding relevant - details/information/guidance that would be helpful to the resolution of the issue. + - Person who is triaging the issue by determining its workability. + This person is responsible for getting the tickets to one of two stages - 1) `help-wanted` 2) `will-not-fix`. + They are responsible for triaging by working with the OP to get additional information as needed and analyzing the issue and adding relevant details/information/guidance that would be helpful to the resolution of the issue. - Collaborator: - - Person(s) that are actually doing the work related to the ticket. Once work is done, - they work with the Reviewer to get feedback implemented and complete the work. They - are responsible for making sure issue status labels are up to date. + - Person(s) that are actually doing the work related to the ticket. + Once work is done, they work with the Reviewer to get feedback implemented and complete the work. + They are responsible for making sure issue status labels are up to date. - Reviewer: - Person whose Approval is needed to merge the PR. @@ -39,51 +34,34 @@ allow us to track work across all projects in a similar manner. - The Maintainer can also label the issue as - `URGENT` (for critical issues) - `help-wanted` for issues which require work and have no one assigned -- Once a Collaborator is assigned, please remove `help-wanted` and add `wip` to - the issue. +- Once a Collaborator is assigned, please remove `help-wanted` and add `wip` to the issue. ## Closing an Issue - Review criteria: - - For interface and design changes: 2 approvals - which must be from reviewers - who work at different companies than the Collaborator. + - For interface and design changes: 2 approvals - which must be from reviewers who work at different companies than the Collaborator. - For smaller or internal changes: 1 approval from a different company. - For `URGENT` issues: - - Collaborator: please provide an initial assessment of the issues to OP ASAP or - within 1 business day, whichever is earlier. - - Reviewer: please review and provide feedback ASAP or within 1 business day, - whichever is earlier. - - Collaborator: please provide an update and/or response to each review comment ASAP - or within 1 business day, whichever is sooner. Merge should happen as soon as - review criteria are met. + - Collaborator: please provide an initial assessment of the issues to OP ASAP or within 1 business day, whichever is earlier. + - Reviewer: please review and provide feedback ASAP or within 1 business day, whichever is earlier. + - Collaborator: please provide an update and/or response to each review comment ASAP or within 1 business day, whichever is sooner. Merge should happen as soon as review criteria are met. - For non-`URGENT` issues - - Collaborator: please provide an initial response or assessment of the issue to - OP within 3 business days. + - Collaborator: please provide an initial response or assessment of the issue to OP within 3 business days. - Reviewer: please review and provide feedback within 3 business days. - - Collaborator: please provide an update and/or response to each review comment - within 3 business days. Once all review comments are resolved, please allow - 1-2 business days for others to raise additional comments/questions, unless - the changes are fixing typos, bugs, documentation, test enhancements, or - implementing already discussed design. + - Collaborator: please provide an update and/or response to each review comment within 3 business days. + Once all review comments are resolved, please allow 1-2 business days for others to raise additional comments/questions, unless the changes are fixing typos, bugs, documentation, test enhancements, or implementing already discussed design. -When closing an issue that we `will-not-fix` or we believe need no further -action, please provide the rationale for closing, and indicate that OP can -re-open for discussion if there are additional info, justification and -questions. +When closing an issue that we `will-not-fix` or we believe need no further action, please provide the rationale for closing, and indicate that OP can re-open for discussion if there are additional info, justification and questions. ## When Issues Get Stuck -Some issues are not directly related to a particular code change. If an -issue is worth considering in the issue backlog, but not scoped clearly -enough for work to begin, then please label it `needs-discussion`. +Some issues are not directly related to a particular code change. +If an issue is worth considering in the issue backlog, but not scoped clearly enough for work to begin, then please label it `needs-discussion`. - When possible, move the discussion forward by using tests and code examples. -- If discussion happens elsewhere, record relevant meeting notes into the - issue. -- When an agreement is made, clearly summarize the decision, and list any - resulting action items which need to be addressed. +- If discussion happens elsewhere, record relevant meeting notes into the issue. +- When an agreement is made, clearly summarize the decision, and list any resulting action items which need to be addressed. -If an issue is stuck because someone is not responding, please add the `stale` -label. It is possible to automate this. E.g. -The minimum time elapsed before the `stale` label is applied is proposed to be -one week. +If an issue is stuck because someone is not responding, please add the `stale` label. +It is possible to automate this. +E.g. The minimum time elapsed before the `stale` label is applied is proposed to be one week. diff --git a/specification/concurrency.md b/specification/concurrency.md index 0964cadd21f..2af89f3f971 100644 --- a/specification/concurrency.md +++ b/specification/concurrency.md @@ -1,22 +1,17 @@ # Concurrency and Thread-Safety of OpenTelemetry API -For languages which support concurrent execution the OpenTelemetry APIs provide -specific guarantees and safeties. Not all of API functions are safe to -be called concurrently. Function and method documentation must explicitly -specify whether it is safe or no to make concurrent calls and in what -situations. +For languages which support concurrent execution the OpenTelemetry APIs provide specific guarantees and safeties. +Not all of API functions are safe to be called concurrently. +Function and method documentation must explicitly specify whether it is safe or no to make concurrent calls and in what situations. -The following are general recommendations of concurrent call safety of -specific subsets of the API. +The following are general recommendations of concurrent call safety of specific subsets of the API. **Tracer** - all methods are safe to be called concurrently. -**SpanBuilder** - It is not safe to concurrently call any methods of the -same SpanBuilder instance. Different instances of SpanBuilder can be safely -used concurrently by different threads/coroutines, provided that no single -SpanBuilder is used by more than one thread/coroutine. +**SpanBuilder** - It is not safe to concurrently call any methods of the same SpanBuilder instance. +Different instances of SpanBuilder can be safely used concurrently by different threads/coroutines, provided that no single SpanBuilder is used by more than one thread/coroutine. **Span** - All methods of Span are safe to be called concurrently. -**Link** - Links are immutable and is safe to be used concurrently. Lazy -initialized links must be implemented to be safe to be called concurrently. +**Link** - Links are immutable and is safe to be used concurrently. +Lazy initialized links must be implemented to be safe to be called concurrently. diff --git a/specification/context/api-propagators.md b/specification/context/api-propagators.md index 89640f1e028..5b4980204e7 100644 --- a/specification/context/api-propagators.md +++ b/specification/context/api-propagators.md @@ -21,47 +21,38 @@ Table of Contents ## Overview -Cross-cutting concerns send their state to the next process using -`Propagator`s, which are defined as objects used to read and write -context data to and from messages exchanged by the applications. +Cross-cutting concerns send their state to the next process using `Propagator`s, which are defined as objects used to read and write context data to and from messages exchanged by the applications. Each concern creates a set of `Propagator`s for every supported `Format`. -Propagators leverage the `Context` to inject and extract data for each -cross-cutting concern, such as traces and correlation context. +Propagators leverage the `Context` to inject and extract data for each cross-cutting concern, such as traces and correlation context. -The Propagators API is expected to be leveraged by users writing -instrumentation libraries. +The Propagators API is expected to be leveraged by users writing instrumentation libraries. The Propagators API currently consists of one `Format`: -- `HTTPTextFormat` is used to inject values into and extract values from carriers as text that travel - in-band across process boundaries. +- `HTTPTextFormat` is used to inject values into and extract values from carriers as text that travel in-band across process boundaries. A binary `Format` will be added in the future. ## HTTP Text Format -`HTTPTextFormat` is a formatter that injects and extracts a cross-cutting concern -value as text into carriers that travel in-band across process boundaries. +`HTTPTextFormat` is a formatter that injects and extracts a cross-cutting concern value as text into carriers that travel in-band across process boundaries. -Encoding is expected to conform to the HTTP Header Field semantics. Values are often encoded as -RPC/HTTP request headers. +Encoding is expected to conform to the HTTP Header Field semantics. +Values are often encoded as RPC/HTTP request headers. -The carrier of propagated data on both the client (injector) and server (extractor) side is -usually an http request. Propagation is usually implemented via library-specific request -interceptors, where the client-side injects values and the server-side extracts them. +The carrier of propagated data on both the client (injector) and server (extractor) side is usually an http request. +Propagation is usually implemented via library-specific request interceptors, where the client-side injects values and the server-side extracts them. -`HTTPTextFormat` MUST expose the APIs that injects values into carriers, -and extracts values from carriers. +`HTTPTextFormat` MUST expose the APIs that injects values into carriers, and extracts values from carriers. ### Fields -The propagation fields defined. If your carrier is reused, you should delete the fields here -before calling [inject](#inject). +The propagation fields defined. +If your carrier is reused, you should delete the fields here before calling [inject](#inject). -For example, if the carrier is a single-use or immutable request object, you don't need to -clear fields as they couldn't have been set before. If it is a mutable, retryable object, -successive calls should clear these fields first. +For example, if the carrier is a single-use or immutable request object, you don't need to clear fields as they couldn't have been set before. +If it is a mutable, retryable object, successive calls should clear these fields first. The use cases of this are: @@ -72,11 +63,14 @@ Returns list of fields that will be used by this formatter. ### Inject -Injects the value downstream. For example, as http headers. +Injects the value downstream. +For example, as http headers. Required arguments: -- A `Context`. The Propagator MUST retrieve the appropriate value from the `Context` first, such as `SpanContext`, `CorrelationContext` or another cross-cutting concern context. For languages supporting current `Context` state, this argument is OPTIONAL, defaulting to the current `Context` instance. +- A `Context`. + The Propagator MUST retrieve the appropriate value from the `Context` first, such as `SpanContext`, `CorrelationContext` or another cross-cutting concern context. + For languages supporting current `Context` state, this argument is OPTIONAL, defaulting to the current `Context` instance. - the carrier that holds propagation fields. For example, an outgoing message or http request. - the `Setter` invoked for each propagation key to add or remove. @@ -86,7 +80,8 @@ Setter is an argument in `Inject` that sets value into given field. `Setter` allows a `HTTPTextFormat` to set propagated fields into a carrier. -`Setter` MUST be stateless and allowed to be saved as a constant to avoid runtime allocations. One of the ways to implement it is `Setter` class with `Set` method as described below. +`Setter` MUST be stateless and allowed to be saved as a constant to avoid runtime allocations. +One of the ways to implement it is `Setter` class with `Set` method as described below. ##### Set @@ -94,7 +89,8 @@ Replaces a propagated field with the given value. Required arguments: -- the carrier holds propagation fields. For example, an outgoing message or http request. +- the carrier holds propagation fields. + For example, an outgoing message or http request. - the key of the field. - the value of the field. @@ -114,9 +110,7 @@ Required arguments: - the carrier holds propagation fields. For example, an outgoing message or http request. - the instance of `Getter` invoked for each propagation key to get. -Returns a new `Context` derived from the `Context` passed as argument, -containing the extracted value, which can be a `SpanContext`, -`CorrelationContext` or another cross-cutting concern context. +Returns a new `Context` derived from the `Context` passed as argument, containing the extracted value, which can be a `SpanContext`, `CorrelationContext` or another cross-cutting concern context. If the extracted value is a `SpanContext`, its `IsRemote` property MUST be set to true. @@ -126,7 +120,8 @@ Getter is an argument in `Extract` that get value from given field `Getter` allows a `HttpTextFormat` to read propagated fields from a carrier. -`Getter` MUST be stateless and allowed to be saved as a constant to avoid runtime allocations. One of the ways to implement it is `Getter` class with `Get` method as described below. +`Getter` MUST be stateless and allowed to be saved as a constant to avoid runtime allocations. +One of the ways to implement it is `Getter` class with `Get` method as described below. ##### Get @@ -137,20 +132,18 @@ Required arguments: - the carrier of propagation fields, such as an HTTP request. - the key of the field. -The Get function is responsible for handling case sensitivity. If the getter is intended to work with a HTTP request object, the getter MUST be case insensitive. To improve compatibility with other text-based protocols, text `Format` implementions MUST ensure to always use the canonical casing for their attributes. NOTE: Cannonical casing for HTTP headers is usually title case (e.g. `Content-Type` instead of `content-type`). +The Get function is responsible for handling case sensitivity. +If the getter is intended to work with a HTTP request object, the getter MUST be case insensitive. +To improve compatibility with other text-based protocols, text `Format` implementions MUST ensure to always use the canonical casing for their attributes. +NOTE: Cannonical casing for HTTP headers is usually title case (e.g. `Content-Type` instead of `content-type`). ## Composite Propagator -Implementations MUST offer a facility to group multiple `Propagator`s -from different cross-cutting concerns in order to leverage them as a -single entity. +Implementations MUST offer a facility to group multiple `Propagator`s from different cross-cutting concerns in order to leverage them as a single entity. -The resulting composite `Propagator` will invoke the `Propagators` -in the order they were specified. +The resulting composite `Propagator` will invoke the `Propagators` in the order they were specified. -Each composite `Propagator` will be bound to a specific `Format`, such -as `HttpTextFormat`, as different `Format`s will likely operate on different -data types. +Each composite `Propagator` will be bound to a specific `Format`, such as `HttpTextFormat`, as different `Format`s will likely operate on different data types. There MUST be functions to accomplish the following operations. - Create a composite propagator @@ -195,7 +188,8 @@ OpenTelemetry implementations. This method MUST exist for each supported `Format`. -Returns a global `Propagator`. This usually will be composite instance. +Returns a global `Propagator`. +This usually will be composite instance. ### Set Global Propagator @@ -205,4 +199,5 @@ Sets the global `Propagator` instance. Required parameters: -- A `Propagator`. This usually will be a composite instance. +- A `Propagator`. + This usually will be a composite instance. diff --git a/specification/context/context.md b/specification/context/context.md index eb64cb3166c..57dc89aad21 100644 --- a/specification/context/context.md +++ b/specification/context/context.md @@ -18,47 +18,38 @@ Table of Contents ## Overview -A `Context` is a propagation mechanism which carries execution-scoped values -across API boundaries and between logically associated execution units. -Cross-cutting concerns access their data in-process using the same shared -`Context` object. - -A `Context` MUST be immutable, and its write operations MUST -result in the creation of a new `Context` containing the original -values and the specified values updated. - -Languages are expected to use the single, widely used `Context` implementation -if one exists for them. In the cases where an extremely clear, pre-existing -option is not available, OpenTelemetry MUST provide its own `Context` -implementation. Depending on the language, its usage may be either explicit -or implicit. - -Users writing instrumentation in languages that use `Context` implicitly are -discouraged from using the `Context` API directly. In those cases, users will -manipulate `Context` through cross-cutting concerns APIs instead, in order to -perform operations such as setting trace or correlation context values for -a specified `Context`. - -A `Context` is expected to have the following operations, with their -respective language differences: +A `Context` is a propagation mechanism which carries execution-scoped values across API boundaries and between logically associated execution units. +Cross-cutting concerns access their data in-process using the same shared `Context` object. + +A `Context` MUST be immutable, and its write operations MUST result in the creation of a new `Context` containing the original values and the specified values updated. + +Languages are expected to use the single, widely used `Context` implementation if one exists for them. +In the cases where an extremely clear, pre-existing option is not available, OpenTelemetry MUST provide its own `Context` implementation. +Depending on the language, its usage may be either explicit or implicit. + +Users writing instrumentation in languages that use `Context` implicitly are discouraged from using the `Context` API directly. +In those cases, users will manipulate `Context` through cross-cutting concerns APIs instead, in order to perform operations such as setting trace or correlation context values for a specified `Context`. + +A `Context` is expected to have the following operations, with their respective language differences: ## Create a key Keys are used to allow cross-cutting concerns to control access to their local state. -They are unique such that other libraries which may use the same context -cannot accidentally use the same key. It is recommended that concerns mediate -data access via an API, rather than provide direct public access to their keys. +They are unique such that other libraries which may use the same context cannot accidentally use the same key. +It is recommended that concerns mediate data access via an API, rather than provide direct public access to their keys. The API MUST accept the following parameter: -- The key name. The key name exists for debugging purposes and does not uniquely identify the key. Multiple calls to `CreateKey` with the same name SHOULD NOT return the same value unless language constraints dictate otherwise. Different languages may impose different restrictions on the expected types, so this parameter remains an implementation detail. +- The key name. + The key name exists for debugging purposes and does not uniquely identify the key. + Multiple calls to `CreateKey` with the same name SHOULD NOT return the same value unless language constraints dictate otherwise. + Different languages may impose different restrictions on the expected types, so this parameter remains an implementation detail. The API MUST return an opaque object representing the newly created key. ## Get value -Concerns can access their local state in the current execution state -represented by a `Context`. +Concerns can access their local state in the current execution state represented by a `Context`. The API MUST accept the following parameters: @@ -69,8 +60,7 @@ The API MUST return the value in the `Context` for the specified key. ## Set value -Concerns can record their local state in the current execution state -represented by a `Context`. +Concerns can record their local state in the current execution state represented by a `Context`. The API MUST accept the following parameters: @@ -82,10 +72,8 @@ The API MUST return a new `Context` containing the new value. ## Optional Global operations -These operations are expected to only be implemented by languages -using `Context` implicitly, and thus are optional. These operations -SHOULD only be used to implement automatic scope switching and define -higher level APIs by SDK components and OpenTelemetry instrumentation libraries. +These operations are expected to only be implemented by languages using `Context` implicitly, and thus are optional. +These operations SHOULD only be used to implement automatic scope switching and define higher level APIs by SDK components and OpenTelemetry instrumentation libraries. ### Get current Context @@ -99,13 +87,11 @@ The API MUST accept the following parameters: - The `Context`. -The API MUST return a value that can be used as a `Token` to restore the previous -`Context`. +The API MUST return a value that can be used as a `Token` to restore the previous `Context`. ### Detach Context -Resets the `Context` associated with the caller's current execution unit -to the value it had before attaching a specified `Context`. +Resets the `Context` associated with the caller's current execution unit to the value it had before attaching a specified `Context`. The API MUST accept the following parameters: diff --git a/specification/correlationcontext/api.md b/specification/correlationcontext/api.md index 8c0282b6849..131f9cf0f4f 100644 --- a/specification/correlationcontext/api.md +++ b/specification/correlationcontext/api.md @@ -34,10 +34,9 @@ specification. ### Get correlations -Returns the name/value pairs in the `CorrelationContext`. The order of name/value pairs MUST NOT be -significant. Based on the language specification, the returned value can be -either an immutable collection or an immutable iterator to the collection of -name/value pairs in the `CorrelationContext`. +Returns the name/value pairs in the `CorrelationContext`. +The order of name/value pairs MUST NOT be significant. +Based on the language specification, the returned value can be either an immutable collection or an immutable iterator to the collection of name/value pairs in the `CorrelationContext`. OPTIONAL parameters: @@ -45,10 +44,8 @@ OPTIONAL parameters: ### Get correlation -To access the value for a name/value pair by a prior event, the Correlations API -SHALL provide a function that takes a context and a name as input, and returns a -value. Returns the value associated with the given name, or null -if the given name is not present. +To access the value for a name/value pair by a prior event, the Correlations API SHALL provide a function that takes a context and a name as input, and returns a value. +Returns the value associated with the given name, or null if the given name is not present. REQUIRED parameters: @@ -60,9 +57,8 @@ OPTIONAL parameters: ### Set correlation -To record the value for a name/value pair, the Correlations API SHALL provide a function which -takes a context, a name, and a value as input. Returns a new `Context` which -contains a `CorrelationContext` with the new value. +To record the value for a name/value pair, the Correlations API SHALL provide a function which takes a context, a name, and a value as input. +Returns a new `Context` which contains a `CorrelationContext` with the new value. REQUIRED parameters: @@ -76,8 +72,8 @@ OPTIONAL parameters: ### Remove correlation -To delete a name/value pair, the Correlations API SHALL provide a function which takes a context -and a name as input. Returns a new `Context` which no longer contains the selected name. +To delete a name/value pair, the Correlations API SHALL provide a function which takes a context and a name as input. +Returns a new `Context` which no longer contains the selected name. REQUIRED parameters: @@ -89,9 +85,8 @@ OPTIONAL parameters: ### Clear correlations -To avoid sending any name/value pairs to an untrusted process, the Correlations API SHALL provide -a function to remove all Correlations from a context. Returns a new `Context` -with no correlations. +To avoid sending any name/value pairs to an untrusted process, the Correlations API SHALL provide a function to remove all Correlations from a context. +Returns a new `Context` with no correlations. OPTIONAL parameters: @@ -104,5 +99,5 @@ OPTIONAL parameters: ## Conflict Resolution -If a new name/value pair is added and its name is the same as an existing name, than the new pair MUST take precedence. The value -is replaced with the added value (regardless if it is locally generated or received from a remote peer). +If a new name/value pair is added and its name is the same as an existing name, than the new pair MUST take precedence. +The value is replaced with the added value (regardless if it is locally generated or received from a remote peer). diff --git a/specification/glossary.md b/specification/glossary.md index b3ec6592e07..4684b4669af 100644 --- a/specification/glossary.md +++ b/specification/glossary.md @@ -1,7 +1,6 @@ # Glossary -Below are a list of some OpenTelemetry specific terms that are used across this -specification. +Below are a list of some OpenTelemetry specific terms that are used across this specification. ## Common @@ -9,8 +8,7 @@ specification. Denotes the library that implements the *OpenTelemetry API*. -See [Library Guidelines](library-guidelines.md#sdk-implementation) and -[Library resource semantic conventions](resource/semantic_conventions/README.md#telemetry-sdk) +See [Library Guidelines](library-guidelines.md#sdk-implementation) and [Library resource semantic conventions](resource/semantic_conventions/README.md#telemetry-sdk) @@ -18,8 +16,7 @@ See [Library Guidelines](library-guidelines.md#sdk-implementation) and Denotes the library for which the telemetry signals (traces, metrics, logs) are gathered. -The calls to the OpenTelemetry API can be done either by the Instrumented Library itself, -or by another [Instrumenting Library](#instrumenting_library). +The calls to the OpenTelemetry API can be done either by the Instrumented Library itself, or by another [Instrumenting Library](#instrumenting_library). Example: `org.mongodb.client`. @@ -28,8 +25,7 @@ Example: `org.mongodb.client`. ### Instrumenting Library Denotes the library that provides the instrumentation for a given [Instrumented Library](#instrumented_library). -*Instrumented Library* and *Instrumenting Library* may be the same library -if it has built-in OpenTelemetry instrumentation. +*Instrumented Library* and *Instrumenting Library* may be the same library if it has built-in OpenTelemetry instrumentation. Example: `io.opentelemetry.contrib.mongodb`. @@ -39,40 +35,34 @@ Synonyms: *Instrumentation Library*, *Integration*. ### Tracer Name / Meter Name -This refers to the `name` and (optional) `version` arguments specified when -creating a new `Tracer` or `Meter` (see [Obtaining a Tracer](trace/api.md#obtaining-a-tracer)/[Obtaining a Meter](metrics/api-user.md#obtaining-a-meter)). It identifies the [Instrumenting Library](#instrumenting_library). +This refers to the `name` and (optional) `version` arguments specified when creating a new `Tracer` or `Meter` (see [Obtaining a Tracer](trace/api.md#obtaining-a-tracer)/[Obtaining a Meter](metrics/api-user.md#obtaining-a-meter)). +It identifies the [Instrumenting Library](#instrumenting_library). ### Namespace -This term applies to [Metric names](metrics/api-user.md#metric-names) only. The namespace is used to disambiguate identical metric -names used in different [instrumenting libraries](#instrumenting_library). The [Name](#name) provided -for creating a `Meter` serves as its namespace in addition to the primary semantics -described [here](#name). +This term applies to [Metric names](metrics/api-user.md#metric-names) only. +The namespace is used to disambiguate identical metric names used in different [instrumenting libraries](#instrumenting_library). +The [Name](#name) provided for creating a `Meter` serves as its namespace in addition to the primary semantics described [here](#name). -The `version` argument is not relevant here and will not be included in -the resulting namespace string. +The `version` argument is not relevant here and will not be included in the resulting namespace string. ## Logs ### Log Record -A recording of an event. Typically the record includes a timestamp indicating -when the event happened as well as other data that describes what happened, -where it happened, etc. +A recording of an event. +Typically the record includes a timestamp indicating when the event happened as well as other data that describes what happened, where it happened, etc. Also known as Log Entry. ### Log -Sometimes used to refer to a collection of Log Records. May be ambiguous, since -people also sometimes use `Log` to refer to a single `Log Record`, thus this -term should be used carefully and in the context where ambiguity is possible -additional qualifiers should be used (e.g. `Log Record`). +Sometimes used to refer to a collection of Log Records. +May be ambiguous, since people also sometimes use `Log` to refer to a single `Log Record`, thus this term should be used carefully and in the context where ambiguity is possible additional qualifiers should be used (e.g. `Log Record`). ### Embedded Log -`Log Records` embedded inside a [Span](trace/api.md#span) -object, in the [Events](trace/api.md#add-events) list. +`Log Records` embedded inside a [Span](trace/api.md#span) object, in the [Events](trace/api.md#add-events) list. ### Standalone Log @@ -84,15 +74,11 @@ Key/value pairs contained in a `Log Record`. ### Structured Logs -Logs that are recorded in a format which has a well-defined structure that allows -to differentiate between different elements of a Log Record (e.g. the Timestamp, -the Attributes, etc). The _Syslog protocol_ ([RFC 5425](https://tools.ietf.org/html/rfc5424)), -for example, defines a `structured-data` format. +Logs that are recorded in a format which has a well-defined structure that allows to differentiate between different elements of a Log Record (e.g. the Timestamp, the Attributes, etc). +The _Syslog protocol_ ([RFC 5425](https://tools.ietf.org/html/rfc5424)), for example, defines a `structured-data` format. ### Flat File Logs -Logs recorded in text files, often one line per log record (although multiline -records are possible too). There is no common industry agreement whether -logs written to text files in more structured formats (e.g. JSON files) -are considered Flat File Logs or not. Where such distinction is important it is -recommended to call it out specifically. +Logs recorded in text files, often one line per log record (although multiline records are possible too). +There is no common industry agreement whether logs written to text files in more structured formats (e.g. JSON files) are considered Flat File Logs or not. +Where such distinction is important it is recommended to call it out specifically. diff --git a/specification/library-guidelines.md b/specification/library-guidelines.md index 96803875843..771b31a1b60 100644 --- a/specification/library-guidelines.md +++ b/specification/library-guidelines.md @@ -107,11 +107,8 @@ _TODO: How should third party library authors who use OpenTelemetry for instrume ### Performance and Blocking -See the [Performance and Blocking](performance.md) specification for -guidelines on the performance expectations that API implementations should meet, strategies for meeting these expectations, and a description of how implementations should document their behavior under load. +See the [Performance and Blocking](performance.md) specification for guidelines on the performance expectations that API implementations should meet, strategies for meeting these expectations, and a description of how implementations should document their behavior under load. ### Concurrency and Thread-Safety -See the [Concurrency and Thread-Safety](concurrency.md) specification for -guidelines on what concurrency safeties should API implementations provide -and how they should be documented. +See the [Concurrency and Thread-Safety](concurrency.md) specification for guidelines on what concurrency safeties should API implementations provide and how they should be documented. diff --git a/specification/library-layout.md b/specification/library-layout.md index 34bca2c6749..ebbf936471e 100644 --- a/specification/library-layout.md +++ b/specification/library-layout.md @@ -1,8 +1,7 @@ # OpenTelemetry Project Package Layout -This documentation serves to document the "look and feel" of a basic layout for OpenTelemetry -projects. This package layout is intentionally generic and it doesn't try to impose a language -specific package structure. +This documentation serves to document the "look and feel" of a basic layout for OpenTelemetry projects. +This package layout is intentionally generic and it doesn't try to impose a language specific package structure. ## API Package @@ -83,10 +82,8 @@ This directory describes the SDK implementation for api/metrics. ### [/sdk/resource](resource/sdk.md) -The resource directory primarily defines a type [Resource](overview.md#resources) that captures -information about the entity for which stats or traces are recorded. For example, metrics exposed -by a Kubernetes container can be linked to a resource that specifies the cluster, namespace, pod, -and container name. +The resource directory primarily defines a type [Resource](overview.md#resources) that captures information about the entity for which stats or traces are recorded. +For example, metrics exposed by a Kubernetes container can be linked to a resource that specifies the cluster, namespace, pod, and container name. ### `/sdk/correlationcontext` diff --git a/specification/metrics/api-meter.md b/specification/metrics/api-meter.md index f7ce7996721..0475a0084d6 100644 --- a/specification/metrics/api-meter.md +++ b/specification/metrics/api-meter.md @@ -3,7 +3,5 @@ -This document will be updated as part of the v0.4 milestone with a -detailed list of `Meter` API functions. These functions are given a -high-level description in the [overview document](api.md) and -this document will simply give details on the `Meter` API. +This document will be updated as part of the v0.4 milestone with a detailed list of `Meter` API functions. +These functions are given a high-level description in the [overview document](api.md) and this document will simply give details on the `Meter` API. diff --git a/specification/metrics/api-user.md b/specification/metrics/api-user.md index e9ea38f8d2d..cfdb42e3204 100644 --- a/specification/metrics/api-user.md +++ b/specification/metrics/api-user.md @@ -24,15 +24,9 @@ -Note: This specification for the v0.3 OpenTelemetry milestone does not -include specification related to the Observer instrument, as described -in the [overview](api.md). Observer instruments were detailed -in [OTEP -72-metric-observer](https://github.com/open-telemetry/oteps/blob/master/text/0072-metric-observer.md) -and will be added to this document following the v0.3 milestone. -Gauge instruments will be removed from this specification folowing the -v0.3 milestone too, as discussed in [OTEP -80-remove-metric-gauge](https://github.com/open-telemetry/oteps/blob/master/text/0080-remove-metric-gauge.md). +Note: This specification for the v0.3 OpenTelemetry milestone does not include specification related to the Observer instrument, as described in the [overview](api.md). +Observer instruments were detailed in [OTEP 72-metric-observer](https://github.com/open-telemetry/oteps/blob/master/text/0072-metric-observer.md) and will be added to this document following the v0.3 milestone. +Gauge instruments will be removed from this specification folowing the v0.3 milestone too, as discussed in [OTEP 80-remove-metric-gauge](https://github.com/open-telemetry/oteps/blob/master/text/0080-remove-metric-gauge.md). ## Overview @@ -43,104 +37,75 @@ Metrics are created by calling methods on a `Meter` which is in turn created by New `Meter` instances can be created via a `MeterProvider` and its `getMeter` method. `MeterProvider`s are generally expected to be used as singletons. -Implementations SHOULD provide a single global default `MeterProvider`. The `getMeter` method expects two string arguments: +Implementations SHOULD provide a single global default `MeterProvider`. +The `getMeter` method expects two string arguments: -- `name` (required): This name must identify the instrumentation library (also referred to as integration, e.g. `io.opentelemetry.contrib.mongodb`) - and *not* the instrumented library. - In case an invalid name (null or empty string) is specified, a working default `Meter` implementation is returned as a fallback - rather than returning null or throwing an exception. +- `name` (required): This name must identify the instrumentation library (also referred to as integration, e.g. `io.opentelemetry.contrib.mongodb`) and *not* the instrumented library. + In case an invalid name (null or empty string) is specified, a working default `Meter` implementation is returned as a fallback rather than returning null or throwing an exception. A `MeterProvider` could also return a no-op `Meter` here if application owners configure the SDK to suppress telemetry produced by this library. This name will be used as the `namespace` for any metrics created using the returned `Meter`. - `version` (optional): Specifies the version of the instrumentation library (e.g. `semver:1.0.0`). ### Metric Instrument names -Metric instruments have names, which are how we refer to them in -external systems. Metric instrument names conform to the following syntax: +Metric instruments have names, which are how we refer to them in external systems. +Metric instrument names conform to the following syntax: 1. They are non-empty strings 2. They are case-insensitive 3. The first character must be non-numeric, non-space, non-punctuation 4. Subsequent characters must be belong to the alphanumeric characters, '_', '.', and '-'. -Metric instrument names belong to a namespace, which is the `name` of the associated `Meter`, -allowing the same metric name to be used in multiple libraries of code, -unambiguously, within the same application. - -Metric instrument names SHOULD be semantically meaningful, even when viewed -outside of the context of the originating Meter name. For example, when instrumenting -an http server library, "latency" is not an appropriate instrument name, as it is too generic. -Instead, as an example, we should favor a name like "http_request_latency", -as it would inform the viewer of the semantic meaning of the latency being tracked. -(Note: this is just an example; actual semantic conventions for instrument naming will -be tracked elsewhere in the specifications.) - -Metric instruments are defined using a `Meter` instance, using a variety -of `New` methods specific to the kind of metric and type of input (integer -or floating point). The Meter will return an error when a metric name is -already registered with a different kind for the same name. Metric systems -are expected to automatically prefix exported metrics by the namespace, if -necessary, in a manner consistent with the target system. For example, a -Prometheus exporter SHOULD use the namespace followed by `_` as the -[application prefix](https://prometheus.io/docs/practices/naming/#metric-names). +Metric instrument names belong to a namespace, which is the `name` of the associated `Meter`, allowing the same metric name to be used in multiple libraries of code, unambiguously, within the same application. + +Metric instrument names SHOULD be semantically meaningful, even when viewed outside of the context of the originating Meter name. +For example, when instrumenting an http server library, "latency" is not an appropriate instrument name, as it is too generic. +Instead, as an example, we should favor a name like "http_request_latency", as it would inform the viewer of the semantic meaning of the latency being tracked. +(Note: this is just an example; actual semantic conventions for instrument naming will be tracked elsewhere in the specifications.) + +Metric instruments are defined using a `Meter` instance, using a variety of `New` methods specific to the kind of metric and type of input (integer or floating point). +The Meter will return an error when a metric name is already registered with a different kind for the same name. Metric systems are expected to automatically prefix exported metrics by the namespace, if necessary, in a manner consistent with the target system. +For example, a Prometheus exporter SHOULD use the namespace followed by `_` as the [application prefix](https://prometheus.io/docs/practices/naming/#metric-names). ### Format of a metric event -Regardless of the instrument kind or method of input, metric events -include the instrument, a numerical value, and an optional -set of labels. The instrument, discussed in detail below, contains -the metric name and various optional settings. +Regardless of the instrument kind or method of input, metric events include the instrument, a numerical value, and an optional set of labels. +The instrument, discussed in detail below, contains the metric name and various optional settings. -Labels are key:value pairs associated with events describing various dimensions -or categories that describe the event. A "label key" refers to the key -component while "label value" refers to the correlated value component of a -label. Label refers to the pair of label key and value. Labels are passed in -to the metric event at construction time. +Labels are key:value pairs associated with events describing various dimensions or categories that describe the event. +A "label key" refers to the key component while "label value" refers to the correlated value component of a label. +Label refers to the pair of label key and value. +Labels are passed in to the metric event at construction time. -Metric events always have an associated component name, the name -passed when constructing the corresponding `Meter`. Metric events are -associated with the current (implicit or explicit) OpenTelemetry -context, including distributed correlation context and span context. +Metric events always have an associated component name, the name passed when constructing the corresponding `Meter`. +Metric events are associated with the current (implicit or explicit) OpenTelemetry context, including distributed correlation context and span context. ### New constructors -The `Meter` interface allows creating of a registered metric -instrument using methods specific to each kind of metric. There are -six constructors representing the three kinds of instrument taking -either floating point or integer inputs, see the detailed design below. +The `Meter` interface allows creating of a registered metric instrument using methods specific to each kind of metric. +There are six constructors representing the three kinds of instrument taking either floating point or integer inputs, see the detailed design below. Binding instruments to a single `Meter` instance has two benefits: 1. Instruments can be exported from the zero state, prior to first use, with no explicit `Register` call 2. The name provided by the `Meter` satisfies a namespace requirement -The recommended practice is to define structures to contain the -instruments in use and keep references only to the instruments that -are specifically needed. - -We recognize that many existing metric systems support allocating -metric instruments statically and providing the `Meter` interface at -the time of use. In this example, typical of statsd clients, existing -code may not be structured with a convenient place to store new metric -instruments. Where this becomes a burden, it is recommended to use -the global meter provider to construct a static `Meter`, to -construct metric instruments. - -The situation is similar for users of Prometheus clients, where -instruments are allocated statically and there is an implicit global. -Such code may not have access to the appropriate `Meter` where -instruments are defined. Where this becomes a burden, it is -recommended to use the global meter provider to construct a static -named `Meter`, to construct metric instruments. +The recommended practice is to define structures to contain the instruments in use and keep references only to the instruments that are specifically needed. + +We recognize that many existing metric systems support allocating metric instruments statically and providing the `Meter` interface at the time of use. +In this example, typical of statsd clients, existing code may not be structured with a convenient place to store new metric instruments. +Where this becomes a burden, it is recommended to use the global meter provider to construct a static `Meter`, to construct metric instruments. + +The situation is similar for users of Prometheus clients, where instruments are allocated statically and there is an implicit global. +Such code may not have access to the appropriate `Meter` where instruments are defined. +Where this becomes a burden, it is recommended to use the global meter provider to construct a static named `Meter`, to construct metric instruments. Applications are expected to construct long-lived instruments. -Instruments are considered permanent for the lifetime of a SDK, there -is no method to delete them. +Instruments are considered permanent for the lifetime of a SDK, there is no method to delete them. #### Metric instrument constructor example code -In this Golang example, a struct holding four instruments is built -using the provided, non-global `Meter` instance. +In this Golang example, a struct holding four instruments is built using the provided, non-global `Meter` instance. ```golang type instruments struct { @@ -160,10 +125,8 @@ func newInstruments(metric.Meter meter) *instruments { } ``` -Code will be structured to call `newInstruments` somewhere in a -constructor and keep the `instruments` reference for use at runtime. -Here's an example of building a server with configured instruments and -a single metric operation. +Code will be structured to call `newInstruments` somewhere in a constructor and keep the `instruments` reference for use at runtime. +Here's an example of building a server with configured instruments and a single metric operation. ```golang type server struct { @@ -200,39 +163,24 @@ The metrics API provides three semantically equivalent ways to capture measureme - calling unbound metric instruments with labels - batch recording without a metric instrument -All three methods generate equivalent metric events, but offer varying degrees -of performance and convenience. +All three methods generate equivalent metric events, but offer varying degrees of performance and convenience. -This section applies to calling conventions for counter, gauge, and -measure instruments. +This section applies to calling conventions for counter, gauge, and measure instruments. -As described above, metric events consist of an instrument, a set of labels, -and a numerical value, plus associated context. The performance of a metric -API depends on the work done to enter a new measurement. One approach to -reduce cost is to aggregate intermediate results in the SDK, so that subsequent -events happening in the same collection period, for the same set of labels, -combine into the same working memory. +As described above, metric events consist of an instrument, a set of labels, and a numerical value, plus associated context. +The performance of a metric API depends on the work done to enter a new measurement. +One approach to reduce cost is to aggregate intermediate results in the SDK, so that subsequent events happening in the same collection period, for the same set of labels, combine into the same working memory. -In this document, the term "aggregation" is used to describe the -process of coalescing metric events for a complete set of labels, -whereas "grouping" is used to describe further coalescing aggregate -metric data into a reduced number of key dimensions. SDKs may be -designed to perform aggregation and/or grouping in the process, with -various trade-offs in terms of complexity and performance. +In this document, the term "aggregation" is used to describe the process of coalescing metric events for a complete set of labels, whereas "grouping" is used to describe further coalescing aggregate metric data into a reduced number of key dimensions. +SDKs may be designed to perform aggregation and/or grouping in the process, with various trade-offs in terms of complexity and performance. #### Bound instrument calling convention -In situations where performance is a requirement and a metric instrument is -repeatedly used with the same set of labels, the developer may elect to use the -_bound instrument_ calling convention as an optimization. For bound -instruments to be a benefit, it requires that a specific instrument will be -re-used with specific labels. If an instrument will be used with the same -labels more than once, obtaining a bound instrument corresponding to the labels -ensures the highest performance available. +In situations where performance is a requirement and a metric instrument is repeatedly used with the same set of labels, the developer may elect to use the _bound instrument_ calling convention as an optimization. +For bound instruments to be a benefit, it requires that a specific instrument will be re-used with specific labels. +If an instrument will be used with the same labels more than once, obtaining a bound instrument corresponding to the labels ensures the highest performance available. -To bind an instrument, use the `Bind(labels)` method to return an interface -that supports the `Add()`, `Set()`, or `Record()` method of the instrument in -question. +To bind an instrument, use the `Bind(labels)` method to return an interface that supports the `Add()`, `Set()`, or `Record()` method of the instrument in question. Bound instruments may consume SDK resources indefinitely. @@ -258,10 +206,8 @@ func (s *server) processStream(ctx context.Context) { #### Direct instrument calling convention -When convenience is more important than performance, or there is no re-use to -potentially optimize with bound instruments, users may elect to operate -directly on metric instruments, supplying labels at the call site. This method -offers the greatest convenience possible +When convenience is more important than performance, or there is no re-use to potentially optimize with bound instruments, users may elect to operate directly on metric instruments, supplying labels at the call site. +This method offers the greatest convenience possible For example, to update a single counter: @@ -275,10 +221,8 @@ func (s *server) method(ctx context.Context) { #### RecordBatch calling convention -There is one final API for entering measurements, which is like the direct -access calling convention but supports multiple simultaneous measurements. The -use of a RecordBatch API supports entering multiple measurements, implying a -semantically atomic update to several instruments. +There is one final API for entering measurements, which is like the direct access calling convention but supports multiple simultaneous measurements. +The use of a RecordBatch API supports entering multiple measurements, implying a semantically atomic update to several instruments. For example: @@ -294,27 +238,19 @@ func (s *server) method(ctx context.Context) { } ``` -Using the RecordBatch calling convention is semantically identical to -a sequence of direct calls, with the addition of atomicity. Because -values are entered in a single call, -the SDK is potentially able to implement an atomic update, from the -exporter's point of view. Calls to `RecordBatch` may potentially -reduce costs because the SDK can enqueue a single bulk update, or take -a lock only once, for example. +Using the RecordBatch calling convention is semantically identical to a sequence of direct calls, with the addition of atomicity. +Because values are entered in a single call, the SDK is potentially able to implement an atomic update, from the exporter's point of view. +Calls to `RecordBatch` may potentially reduce costs because the SDK can enqueue a single bulk update, or take a lock only once, for example. ##### Missing label keys -When the SDK interprets labels in the context of grouping aggregated values for -an exporter, and where there are keys that are missing, the SDK is required to -consider these values _explicitly unspecified_, a distinct value type of the -exported data model. +When the SDK interprets labels in the context of grouping aggregated values for an exporter, and where there are keys that are missing, the SDK is required to consider these values _explicitly unspecified_, a distinct value type of the exported data model. ##### Option: Ordered labels -As a language-level decision, APIs may support label key ordering. In this -case, the user may specify an ordered sequence of label keys, which is used to -create an unordered set of labels from a sequence of similarly ordered label -values. For example: +As a language-level decision, APIs may support label key ordering. +In this case, the user may specify an ordered sequence of label keys, which is used to create an unordered set of labels from a sequence of similarly ordered label values. +For example: ```golang @@ -327,21 +263,16 @@ for _, input := range stream { } ``` -This is specified as a language-optional feature because its safety, and -therefore its value as an input for monitoring, depends on the availability of -type-checking in the source language. Passing unordered labels (i.e., a -mapping from keys to values) is considered the safer alternative. +This is specified as a language-optional feature because its safety, and therefore its value as an input for monitoring, depends on the availability of type-checking in the source language. +Passing unordered labels (i.e., a mapping from keys to values) is considered the safer alternative. ## Detailed specification -See the [SDK-facing Metrics API](api-meter.md) specification -for an in-depth summary of each method in the Metrics API. +See the [SDK-facing Metrics API](api-meter.md) specification for an in-depth summary of each method in the Metrics API. ### Instrument construction -Instruments are constructed using the appropriate `New` method for the -kind of instrument (Counter, Gauge, Measure) and for the type of input -(integer or floating point). +Instruments are constructed using the appropriate `New` method for the kind of instrument (Counter, Gauge, Measure) and for the type of input (integer or floating point). | `Meter` method | Kind of instrument | |-------------------------------------|--------------------| @@ -353,28 +284,21 @@ kind of instrument (Counter, Gauge, Measure) and for the type of input | `NewFloatMeasure(name, options...)` | A floating point measure | As in all OpenTelemetry specifications, these names are examples. -Each language committee will decide on the appropriate names based on -conventions in that language. +Each language committee will decide on the appropriate names based on conventions in that language. #### Recommended label keys -Instruments may be defined with a recommended set of label keys. This -setting may be used by SDKs as a good default for grouping exported -metrics, where used with pre-aggregation. The recommended label keys -are usually selected by the developer for exhibiting low cardinality, -importance for monitoring purposes, and _an intention to provide these -variables locally_. +Instruments may be defined with a recommended set of label keys. +This setting may be used by SDKs as a good default for grouping exported metrics, where used with pre-aggregation. +The recommended label keys are usually selected by the developer for exhibiting low cardinality, importance for monitoring purposes, and _an intention to provide these variables locally_. -SDKs should consider grouping exported metric data by the recommended label -keys of each instrument, unless superceded by another form of configuration. -Recommended keys that are missing will be considered explicitly unspecified, as -for missing labels in general. +SDKs should consider grouping exported metric data by the recommended label keys of each instrument, unless superceded by another form of configuration. +Recommended keys that are missing will be considered explicitly unspecified, as for missing labels in general. #### Instrument options -Instruments provide several optional settings, summarized here. The -kind of instrument and input value type are implied by the constructor -that it used, and the metric name is the only required field. +Instruments provide several optional settings, summarized here. +The kind of instrument and input value type are implied by the constructor that it used, and the metric name is the only required field. | Option | Option name | Explanation | |------------------------|---------------------------|-------------| @@ -384,42 +308,28 @@ that it used, and the metric name is the only required field. | Monotonic | WithMonotonic(boolean) | Configure a counter or gauge that accepts only monotonic/non-monotonic updates. | | Absolute | WithAbsolute(boolean) | Configure a measure that does or does not accept negative updates. | -See the Metric API [specification overview](api.md) for more -information about the kind-specific monotonic and absolute options. +See the Metric API [specification overview](api.md) for more information about the kind-specific monotonic and absolute options. ### Bound instrument API -Counter, gauge, and measure instruments each support allocating bound -instruments for the high-performance calling convention. The -`Instrument.Bind(labels)` method returns an interface which implements the -`Add()`, `Set()` or `Record()` method, respectively, for counter, gauge, and -measure instruments. +Counter, gauge, and measure instruments each support allocating bound instruments for the high-performance calling convention. +The `Instrument.Bind(labels)` method returns an interface which implements the `Add()`, `Set()` or `Record()` method, respectively, for counter, gauge, and measure instruments. ### Direct instrument API -Counter, gauge, and measure instruments support the appropriate -`Add()`, `Set()`, and `Record()` method for submitting individual -metric events. +Counter, gauge, and measure instruments support the appropriate `Add()`, `Set()`, and `Record()` method for submitting individual metric events. ### Interaction with distributed correlation context -As described above, labels are strictly "local". I.e., the application -explicitly declares these labels, whereas distributed correlation context -labels are implicitly associated with the event. - -There is a clear intention to pre-aggregate metrics within the SDK, using -labels to derive grouping keys. There are two available options for users to -apply distributed correlation context to the local grouping function used for -metrics pre-aggregation: - -1. The distributed context, whether implicit or explicit, is - associated with every metric event. The SDK could _automatically_ - project selected label keys from the distributed correlation into the - metric event. -2. The user can explicitly perform the same projection of distributed - correlation into labels by extracting labels from the correlation - context and including them in the call to create the metric or bound - instrument. +As described above, labels are strictly "local". +I.e., the application explicitly declares these labels, whereas distributed correlation context labels are implicitly associated with the event. + +There is a clear intention to pre-aggregate metrics within the SDK, using labels to derive grouping keys. +There are two available options for users to apply distributed correlation context to the local grouping function used for metrics pre-aggregation: + +1. The distributed context, whether implicit or explicit, is associated with every metric event. + The SDK could _automatically_ project selected label keys from the distributed correlation into the metric event. +2. The user can explicitly perform the same projection of distributed correlation into labels by extracting labels from the correlation context and including them in the call to create the metric or bound instrument. An example of an explicit projection follows. diff --git a/specification/metrics/api.md b/specification/metrics/api.md index a8b248b8da1..4d5bdf604fd 100644 --- a/specification/metrics/api.md +++ b/specification/metrics/api.md @@ -35,50 +35,29 @@ ## Overview -The OpenTelemetry Metrics API supports capturing measurements about -the execution of a computer program at run time. The Metrics API is -designed explicitly for processing raw measurements, generally with -the intent to produce continuous summaries of those measurements -simultaneously. Hereafter, "the API" refers to the OpenTelemetry -Metrics API. - -The API provides functions for capturing raw measurements, through -several [calling -conventions](api-user.md#metric-calling-conventions) that -offer different levels of performance. Regardless of calling -convention, we define a _metric event_ as the logical thing that -happens when a new measurement is captured. This moment of capture -(at "run time") defines an implicit timestamp, which is the wall time -an SDK would read from a clock at that moment. - -The word "semantic" or "semantics" as used here refers to _how we give -meaning_ to metric events, as they take place under the API. The term -is used extensively in this document to define and explain these API -functions and how we should interpret them. As far as possible, the -terminology used here tries to convey the intended semantics, and a -_standard implementation_ will be described below to help us -understand their meaning. The standard implementation performs -aggregation corresponding to the default interpretation for each kind -of metric event. - -Monitoring and alerting systems commonly use the data provided through -metric events, after applying various [aggregations](#aggregations) -and converting into various [exposition formats](#exposition-formats). -However, we find that there are many other uses for metric events, -such as to record aggregated or raw measurements in tracing and -logging systems. For this reason, [OpenTelemetry requires a -separation of the API from the SDK](../library-guidelines.md#requirements), -so that different SDKs can be configured at run time. +The OpenTelemetry Metrics API supports capturing measurements about the execution of a computer program at run time. +The Metrics API is designed explicitly for processing raw measurements, generally with the intent to produce continuous summaries of those measurements simultaneously. +Hereafter, "the API" refers to the OpenTelemetry Metrics API. + +The API provides functions for capturing raw measurements, through several [calling conventions](api-user.md#metric-calling-conventions) that offer different levels of performance. +Regardless of calling convention, we define a _metric event_ as the logical thing that happens when a new measurement is captured. +This moment of capture (at "run time") defines an implicit timestamp, which is the wall time an SDK would read from a clock at that moment. + +The word "semantic" or "semantics" as used here refers to _how we give meaning_ to metric events, as they take place under the API. +The term is used extensively in this document to define and explain these API functions and how we should interpret them. +As far as possible, the terminology used here tries to convey the intended semantics, and a _standard implementation_ will be described below to help us understand their meaning. +The standard implementation performs aggregation corresponding to the default interpretation for each kind of metric event. + +Monitoring and alerting systems commonly use the data provided through metric events, after applying various [aggregations](#aggregations) and converting into various [exposition formats](#exposition-formats). +However, we find that there are many other uses for metric events, such as to record aggregated or raw measurements in tracing and logging systems. +For this reason, [OpenTelemetry requires a separation of the API from the SDK](../library-guidelines.md#requirements), so that different SDKs can be configured at run time. ### Metric Instruments -A _metric instrument_, of which there are three kinds, is a device for -capturing raw measurements into the API. There are Counter, Measure, -and Observer instruments, each with different semantics and intended -uses, that will be specified here. All measurements captured by the -API are associated with an instrument, which gives the measurement its -properties. Instruments are created and defined through calls to a -`Meter` API, which is the user-facing entry point to the SDK. +A _metric instrument_, of which there are three kinds, is a device for capturing raw measurements into the API. +There are Counter, Measure, and Observer instruments, each with different semantics and intended uses, that will be specified here. +All measurements captured by the API are associated with an instrument, which gives the measurement its properties. +Instruments are created and defined through calls to a `Meter` API, which is the user-facing entry point to the SDK. Each kind of metric instrument has its own semantics, briefly described as: @@ -87,121 +66,76 @@ described as: - Measure: metric events of this kind _Record_ a value that is aggregated over time. - Observer: metric events of this kind _Observe_ a coherent set of values at an instant in time. -An _instrument definition_ describes several properties of the -instrument, including its name and its kind. The other properties of -a metric instrument are optional, including a description, the unit of -measurement, and several settings that convey additional meaning -(e.g., monotonicity). An instrument definition is associated with the -events that it produces. +An _instrument definition_ describes several properties of the instrument, including its name and its kind. +The other properties of a metric instrument are optional, including a description, the unit of measurement, and several settings that convey additional meaning (e.g., monotonicity). +An instrument definition is associated with the events that it produces. -Details about calling conventions for each kind of instrument are -covered in the [user-level API specification](api-user.md). +Details about calling conventions for each kind of instrument are covered in the [user-level API specification](api-user.md). ### Labels -A _Label_ is the term used to refer to a key-value attribute associated with a -metric event, similar to a [Span attribute](../trace/api.md#span) in the -tracing API. +A _Label_ is the term used to refer to a key-value attribute associated with a metric event, similar to a [Span attribute](../trace/api.md#span) in the tracing API. -Each of the instrument calling conventions detailed in the [user-level API -specification](api-user.md) accept a set of labels as an argument. +Each of the instrument calling conventions detailed in the [user-level API specification](api-user.md) accept a set of labels as an argument. ### Meter Interface -To produce measurements using an instrument, you need an SDK that implements -the `Meter` API. This interface consists of a set of instrument constructors, -and a facilities for capturing batches of measurements in a semantically atomic -way. - -There is a global `Meter` instance available for use that facilitates -automatic instrumentation for third-party code. Use of this instance -allows code to statically initialize its metric instruments, without -explicit dependency injection. The global `Meter` instance acts as a -no-op implementation until the application explicitly initializes a -global `Meter` by installing an SDK. - -As an obligatory step, the API requires the caller to provide the name -of the instrumenting library (optionally, the version) when obtaining -a `Meter` implementation, that is meant to be used for identifying -instrumentation produced from that library for such purposes as -disabling instrumentation, configuring aggregation, and applying -sampling policies. (TODO: refer to the semantic convention on the -reporting library name). - -Details about installing an SDK and obtaining a named `Meter` are -covered in the [SDK-level API specification](api-meter.md). +To produce measurements using an instrument, you need an SDK that implements the `Meter` API. +This interface consists of a set of instrument constructors, and a facilities for capturing batches of measurements in a semantically atomic way. + +There is a global `Meter` instance available for use that facilitates automatic instrumentation for third-party code. +Use of this instance allows code to statically initialize its metric instruments, without explicit dependency injection. +The global `Meter` instance acts as a no-op implementation until the application explicitly initializes a global `Meter` by installing an SDK. + +As an obligatory step, the API requires the caller to provide the name of the instrumenting library (optionally, the version) when obtaining a `Meter` implementation, that is meant to be used for identifying instrumentation produced from that library for such purposes as disabling instrumentation, configuring aggregation, and applying sampling policies. +(TODO: refer to the semantic convention on the reporting library name). + +Details about installing an SDK and obtaining a named `Meter` are covered in the [SDK-level API specification](api-meter.md). ### Aggregations -_Aggregation_ refers to the process of combining a large number of -measurements into exact or estimated statistics about the metric -events that took place during a window of real time, during program -execution. Computing aggregations is mainly a subject of the SDK -specification, with the goal of reducing the amount of data that must -be sent to the telemetry collection backend. +_Aggregation_ refers to the process of combining a large number of measurements into exact or estimated statistics about the metric events that took place during a window of real time, during program execution. +Computing aggregations is mainly a subject of the SDK specification, with the goal of reducing the amount of data that must be sent to the telemetry collection backend. -Users do not have a facility in the API to select the aggregation they -want for particular instruments. The choice of instrument dictates -semantics and thus gives a default interpretation. For the standard -implementation: +Users do not have a facility in the API to select the aggregation they want for particular instruments. +The choice of instrument dictates semantics and thus gives a default interpretation. +For the standard implementation: - Counter instruments use _Sum_ aggregation - Measure instruments use _MinMaxSumCount_ aggregation - Observer instruments use _LastValue_ aggregation. -The default Metric SDK specification includes support for configuring -alternative aggregations, so that metric instruments can be repuposed -and their data can be examined in different ways. Using the default -SDK, or an alternate one, we are able to change the interpretation of -metric events at runtime. +The default Metric SDK specification includes support for configuring alternative aggregations, so that metric instruments can be repuposed and their data can be examined in different ways. +Using the default SDK, or an alternate one, we are able to change the interpretation of metric events at runtime. -Other standard aggregations are available, especially for Measure -instruments, where we are generally interested in a variety of forms -of statistics, such as histogram and quantile summaries. +Other standard aggregations are available, especially for Measure instruments, where we are generally interested in a variety of forms of statistics, such as histogram and quantile summaries. ### Time -Time is a fundamental property of metric events, but not an explicit -one. Users do not provide explicit timestamps for metric events. -SDKs are discouraged from capturing the current timestamp for each -event (by reading from a clock) unless there is a definite need for -high-precision timestamps. - -This non-requirement stems from a common optimization in metrics -reporting, which is to configure metric data collection with a -relatively small period (e.g., 1 second) and use a single timestamp to -describe a batch of exported data, since the loss of precision is -insignificant when aggregating data across minutes or hours of data. - -Aggregations are commonly computed over a series of events that fall -into a contiguous region of time, known as the collection interval. -Since the SDK controls the decision to start collection, it is possible to -collect aggregated metric data while only reading the clock once per -collection interval. The default SDK takes this approach. - -Counter and Measure instruments offer synchronous APIs for capturing -measurements. Metric events from Counter and Measure instruments are -captured at the moment they happen, when the SDK receives the -corresponding function call. - -The Observer instrument supports an asynchronous API, allowing the SDK to -collect metric data on demand, once per collection interval. A single Observer -instrument callback can capture multiple metric events associated with -different sets of labels. Semantically, by definition, these observations are -captured at a single instant in time, the instant that they became the current -set of last-measured values. - -Because metric events are implicitly timestamped, we could refer to a -series of metric events as a _time series_. However, we reserve the -use of this term for the SDK specification, to refer to parts of a -data format that express explicitly timestamped values, in a sequence, -resulting from an aggregation of raw measurements over time. +Time is a fundamental property of metric events, but not an explicit one. +Users do not provide explicit timestamps for metric events. +SDKs are discouraged from capturing the current timestamp for each event (by reading from a clock) unless there is a definite need for high-precision timestamps. + +This non-requirement stems from a common optimization in metrics reporting, which is to configure metric data collection with a relatively small period (e.g., 1 second) and use a single timestamp to describe a batch of exported data, since the loss of precision is insignificant when aggregating data across minutes or hours of data. + +Aggregations are commonly computed over a series of events that fall into a contiguous region of time, known as the collection interval. +Since the SDK controls the decision to start collection, it is possible to collect aggregated metric data while only reading the clock once per collection interval. +The default SDK takes this approach. + +Counter and Measure instruments offer synchronous APIs for capturing measurements. +Metric events from Counter and Measure instruments are captured at the moment they happen, when the SDK receives the corresponding function call. + +The Observer instrument supports an asynchronous API, allowing the SDK to collect metric data on demand, once per collection interval. +A single Observer instrument callback can capture multiple metric events associated with different sets of labels. +Semantically, by definition, these observations are captured at a single instant in time, the instant that they became the current set of last-measured values. + +Because metric events are implicitly timestamped, we could refer to a series of metric events as a _time series_. +However, we reserve the use of this term for the SDK specification, to refer to parts of a data format that express explicitly timestamped values, in a sequence, resulting from an aggregation of raw measurements over time. ### Metric Event Format -Metric events have the same logical representation, regardless of -kind. Whether a Counter, a Measure, or an Observer instrument, metric -events produced through an instrument consist of: +Metric events have the same logical representation, regardless of kind. +Whether a Counter, a Measure, or an Observer instrument, metric events produced through an instrument consist of: - [Context](../context/context.md) (Span context, Correlation context) - timestamp (implicit to the SDK) @@ -209,138 +143,101 @@ events produced through an instrument consist of: - associated label keys and values - value (a signed integer or floating point number) -This format is the result of separating the API from the SDK--a common -representation for metric events, where the only semantic distinction -is the kind of instrument that was specified by the user. +This format is the result of separating the API from the SDK--a common representation for metric events, where the only semantic distinction is the kind of instrument that was specified by the user. ## Three kinds of instrument -Because the API is separated from the SDK, the implementation -ultimately determines how metric events are handled. Therefore, the -choice of instrument should be guided by semantics and the intended -interpretation. Here we detail the three instruments and their -individual semantics. +Because the API is separated from the SDK, the implementation ultimately determines how metric events are handled. +Therefore, the choice of instrument should be guided by semantics and the intended interpretation. +Here we detail the three instruments and their individual semantics. ### Counter -Counter instruments are used to capture changes in running sums, -synchronously. These are commonly used to monitor rates, and they are -sometimes used to capture totals that rise and fall. An essential -property of Counter instruments is that two events `Add(m)` and -`Add(n)` are semantically equivalent to one event `Add(m+n)`. This -property means that Counter events can be combined inexpensively, by -definition. - -Note that `Add(0)` events are not considered a special case, despite -contributing nothing to a sum. `Add(0)` events MUST be observed by -the SDK in case non-default aggregations are configured for the -instrument. - -Counter instruments can be seen as special cases of Measure -instruments with the additive property described above and a -more-specific verb to improve readability (i.e., "Add" instead of -"Record"). Counter instruments are special cases of Measure -instruments in that they only preserve a Sum, by default, and no other -summary statistics. - -Labels associated with Counter instrument events can be used to -compute rates and totals from the instrument, over selected -dimensions. +Counter instruments are used to capture changes in running sums, synchronously. +These are commonly used to monitor rates, and they are sometimes used to capture totals that rise and fall. +An essential property of Counter instruments is that two events `Add(m)` and `Add(n)` are semantically equivalent to one event `Add(m+n)`. +This property means that Counter events can be combined inexpensively, by definition. + +Note that `Add(0)` events are not considered a special case, despite contributing nothing to a sum. +`Add(0)` events MUST be observed by the SDK in case non-default aggregations are configured for the instrument. + +Counter instruments can be seen as special cases of Measure instruments with the additive property described above and a more-specific verb to improve readability (i.e., "Add" instead of "Record"). +Counter instruments are special cases of Measure instruments in that they only preserve a Sum, by default, and no other summary statistics. + +Labels associated with Counter instrument events can be used to compute rates and totals from the instrument, over selected dimensions. ### Measure -Semantically, metric events from Measure instruments are independent, -meaning they cannot be combined naturally, as with Counters. Measure -instruments are used to capture many kinds of information, -synchronously, and are recommended for all cases that reflect an event -in the application where the additive property of Counter instruments -does not apply. +Semantically, metric events from Measure instruments are independent, meaning they cannot be combined naturally, as with Counters. +Measure instruments are used to capture many kinds of information, synchronously, and are recommended for all cases that reflect an event in the application where the additive property of Counter instruments does not apply. -Labels associated with Measure instrument events can be used to -compute information about the distribution of values from the -instrument, over selected dimensions. When aggregating Measure -events, the output statistics are expected to reflect the combined -data set. +Labels associated with Measure instrument events can be used to compute information about the distribution of values from the instrument, over selected dimensions. +When aggregating Measure events, the output statistics are expected to reflect the combined data set. ### Observer -Observer instruments are used to capture a _current set of values_ at -a point in time. Observer instruments are asynchronous, with the use -of callbacks allowing the user to capture multiple values per -collection interval. Observer instruments are not associated with a -Context, by definition. This means, for example, it is not possible -to associate Observer instrument events with Correlation or Span -context. - -Observer instruments capture not only current values, but also effectively -_which labels are current_ at the moment of collection. These instruments can -be used to compute probabilities and ratios, because values are part of a set. - -Unlike Counter and Measure instruments, Observer instruments are -synchronized with collection. There is no aggregation across time for -Observer instruments by definition, only the current set of values is -semantically defined. Because Observer instruments are activated by -the SDK, they can be effectively disabled at low cost. - -These values are considered coherent, because measurements from an -Observer instrument in a single collection interval are captured at -the same logical time. A single callback invocation generates (zero -or more) simultaneous metric events, all sharing an implicit timestamp. +Observer instruments are used to capture a _current set of values_ at a point in time. +Observer instruments are asynchronous, with the use of callbacks allowing the user to capture multiple values per collection interval. +Observer instruments are not associated with a Context, by definition. +This means, for example, it is not possible to associate Observer instrument events with Correlation or Span context. + +Observer instruments capture not only current values, but also effectively _which labels are current_ at the moment of collection. +These instruments can be used to compute probabilities and ratios, because values are part of a set. + +Unlike Counter and Measure instruments, Observer instruments are synchronized with collection. +There is no aggregation across time for Observer instruments by definition, only the current set of values is semantically defined. +Because Observer instruments are activated by the SDK, they can be effectively disabled at low cost. + +These values are considered coherent, because measurements from an Observer instrument in a single collection interval are captured at the same logical time. +A single callback invocation generates (zero or more) simultaneous metric events, all sharing an implicit timestamp. ## Interpretation -We believe the three instrument kinds Counter, Measure, and Observer -form a sufficient basis for expressing nearly all metric data. But if -the API and SDK are separated, and the SDK can handle any metric event -as it pleases, why not have just one kind of instrument? How are the -instruments fundamentally different, and why are there only three? - -Establishing different kinds of instrument is important because in -most cases it allows the SDK to provide good default functionality, -without requiring alternative behaviors to be configured. The choice -of instrument determines not only the meaning of the events but also -the name of the function used to report data. The function -names--`Add()` for Counter instruments, `Record()` for Measure -instruments, and `Observe()` for Observer instruments--help convey the -meaning of these actions. +We believe the three instrument kinds Counter, Measure, and Observer form a sufficient basis for expressing nearly all metric data. +But if the API and SDK are separated, and the SDK can handle any metric event as it pleases, why not have just one kind of instrument? +How are the instruments fundamentally different, and why are there only three? + +Establishing different kinds of instrument is important because in most cases it allows the SDK to provide good default functionality, without requiring alternative behaviors to be configured. +The choice of instrument determines not only the meaning of the events but also the name of the function used to report data. +The function names--`Add()` for Counter instruments, `Record()` for Measure instruments, and `Observe()` for Observer instruments--help convey the meaning of these actions. ### Standard implementation The standard implementation for the three instruments is defined as follows: -1. Counter. The `Add()` function accumulates a total for each distinct set of labels. When aggregating over labels for a Counter, combine using arithmetic addition and export as a sum. Depending on the exposition format, sums are exported either as pairs of labels and cumulative _delta_ or as pairs of labels and cumulative _total_. - -2. Measure. Use the `Record()` function to report events for which the SDK will compute summary statistics about the distribution of values, for each distinct set of labels. The summary statistics to use are determined by the aggregation, but they usually include at least the sum of values, the count of measurements, and the minimum and maximum values. When aggregating distinct Measure events, report summary statistics of the combined value distribution. Exposition formats for summary statistics vary widely, but typically include pairs of labels and (sum, count, minimum and maximum value). - -3. Observer. Current values are provided by the Observer callback at the end of each Metric collection period. When aggregating values _for the same set of labels_, combine using the most-recent value. When aggregating values _for different sets of labels_, combine the value distribution as for Measure instruments. Export as pairs of labels and (sum, count, minimum and maximum value). +1. Counter. The `Add()` function accumulates a total for each distinct set of labels. + When aggregating over labels for a Counter, combine using arithmetic addition and export as a sum. + Depending on the exposition format, sums are exported either as pairs of labels and cumulative _delta_ or as pairs of labels and cumulative _total_. +2. Measure. Use the `Record()` function to report events for which the SDK will compute summary statistics about the distribution of values, for each distinct set of labels. + The summary statistics to use are determined by the aggregation, but they usually include at least the sum of values, the count of measurements, and the minimum and maximum values. + When aggregating distinct Measure events, report summary statistics of the combined value distribution. + Exposition formats for summary statistics vary widely, but typically include pairs of labels and (sum, count, minimum and maximum value). +3. Observer. Current values are provided by the Observer callback at the end of each Metric collection period. + When aggregating values _for the same set of labels_, combine using the most-recent value. + When aggregating values _for different sets of labels_, combine the value distribution as for Measure instruments. + Export as pairs of labels and (sum, count, minimum and maximum value). -We believe that the standard behavior of one of these three -instruments covers nearly all use-cases for users of OpenTelemetry in -terms of the intended semantics. +We believe that the standard behavior of one of these three instruments covers nearly all use-cases for users of OpenTelemetry in terms of the intended semantics. ### Future Work: Option Support -We are aware of a number of reasons to iterate on these -instrumentation kinds, in order to offer: +We are aware of a number of reasons to iterate on these instrumentation kinds, in order to offer: -1. Range restrictions on input data. Instruments accepting negative values is rare in most applications, for example, and it is useful to offer both a semantic declaration (e.g., "negative values are meaningless") and a data validation step (e.g., "negative values should be dropped"). -2. Monotonicity support. When a series of values is known to be monotonic, it is useful to declare this.. +1. Range restrictions on input data. + Instruments accepting negative values is rare in most applications, for example, and it is useful to offer both a semantic declaration (e.g., "negative values are meaningless") and a data validation step (e.g., "negative values should be dropped"). +2. Monotonicity support. + When a series of values is known to be monotonic, it is useful to declare this.. -For the most part, these behaviors are not necessary for correctness -within the local process or the SDK, but they are valuable in -down-stream services that use this data. We look to future work on -this subject. +For the most part, these behaviors are not necessary for correctness within the local process or the SDK, but they are valuable in down-stream services that use this data. +We look to future work on this subject. ### Future Work: Configurable Aggregations / View API -The API does not support configurable aggregations, in this -specification. This is a requirement for OpenTelemetry, but there are -two ways this has been requested. +The API does not support configurable aggregations, in this specification. +This is a requirement for OpenTelemetry, but there are two ways this has been requested. -A _View API_ is defined as an interface to an SDK mechanism that -supports configuring aggregations, including which operator is applied -(sum, p99, last-value, etc.) and which dimensions are used. +A _View API_ is defined as an interface to an SDK mechanism that supports configuring aggregations, including which operator is applied (sum, p99, last-value, etc.) and which dimensions are used. 1. Should the API user be provided with options to configure specific views, statically, in the source? 2. Should the View API be a stand-alone facility, able to install configurable aggregations, at runtime? @@ -349,161 +246,118 @@ See the [current issue on this topic](https://github.com/open-telemetry/opentele ## Metric instrument selection -To guide the user in selecting the right kind of metric instrument for -an application, we'll consider several questions about the kind of -numbers being reported. Here are some ways to help choose. Examples -are provided in the following section. +To guide the user in selecting the right kind of metric instrument for an application, we'll consider several questions about the kind of numbers being reported. +Here are some ways to help choose. +Examples are provided in the following section. ### Counters and Measures compared -Counters and Measures are both recommended for reporting measurements -taken during synchronous activity, driven by events in the program. -These measurements include an associated distributed context, the -effective span context (if any), the correlation context, and -user-provided LabelSet values. +Counters and Measures are both recommended for reporting measurements taken during synchronous activity, driven by events in the program. +These measurements include an associated distributed context, the effective span context (if any), the correlation context, and user-provided LabelSet values. -Start with an application for metrics data in mind. It is useful to -consider whether you are more likely to be interested in the sum of -values or any other aggregate value (e.g., average, histogram), as -processed by the instrument. Counters are useful when only the sum is -interesting. Measures are useful when the sum and any other kind of -summary information about the individual values are of interest. +Start with an application for metrics data in mind. +It is useful to consider whether you are more likely to be interested in the sum of values or any other aggregate value (e.g., average, histogram), as processed by the instrument. +Counters are useful when only the sum is interesting. +Measures are useful when the sum and any other kind of summary information about the individual values are of interest. If only the sum is of interest, use a Counter instrument. -If you are interested in any other kind of summary value or statistic, -such as mean, median and other quantiles, or minimum and maximum -value, use a Measure instrument. Measure instruments are used to -report any kind of measurement that is not typically expressed as a -rate or as a total sum. +If you are interested in any other kind of summary value or statistic, such as mean, median and other quantiles, or minimum and maximum value, use a Measure instrument. +Measure instruments are used to report any kind of measurement that is not typically expressed as a rate or as a total sum. ### Observer instruments -Observer instruments are recommended for reporting measurements about -the state of the program periodically. These expose current -information about the program itself, not related to individual events -taking place in the program. Observer instruments are reported -outside of a context, thus do not have an effective span context or -correlation context. +Observer instruments are recommended for reporting measurements about the state of the program periodically. +These expose current information about the program itself, not related to individual events taking place in the program. +Observer instruments are reported outside of a context, thus do not have an effective span context or correlation context. -Observer instruments are meant to be used when measured values report -on the current state of the program, as opposed to an event or a -change of state in the program. +Observer instruments are meant to be used when measured values report on the current state of the program, as opposed to an event or a change of state in the program. ## Examples ### Reporting total bytes read -You wish to monitor the total number of bytes read from a messaging -server that supports several protocols. The number of bytes read -should be labeled with the protocol name and aggregated in the -process. +You wish to monitor the total number of bytes read from a messaging server that supports several protocols. +The number of bytes read should be labeled with the protocol name and aggregated in the process. -This is a typical application for the Counter instrument. Use one Counter for -capturing the number bytes read. When handling a request, compute a LabelSet -containing the name of the protocol and potentially other useful labels, then -call `Add()` with the same labels and the number of bytes read. +This is a typical application for the Counter instrument. +Use one Counter for capturing the number bytes read. +When handling a request, compute a LabelSet containing the name of the protocol and potentially other useful labels, then call `Add()` with the same labels and the number of bytes read. -To lower the cost of this reporting, you can `Bind()` the instrument with each -of the supported protocols ahead of time. +To lower the cost of this reporting, you can `Bind()` the instrument with each of the supported protocols ahead of time. ### Reporting total bytes read and bytes per request -You wish to monitor the total number of bytes read as well as the -number of bytes read per request, to have observability into total -traffic as well as typical request size. As with the example above, -these metric events should be labeled with a protocol name. +You wish to monitor the total number of bytes read as well as the number of bytes read per request, to have observability into total traffic as well as typical request size. +As with the example above, these metric events should be labeled with a protocol name. -This is a typical application for the Measure instrument. Use one -Measure for capturing the number of bytes per request. A sum -aggregation applied to this data yields the total bytes read; other -aggregations allow you to export the minimum and maximum number of -bytes read, as well as the average value, and quantile estimates. +This is a typical application for the Measure instrument. +Use one Measure for capturing the number of bytes per request. +A sum aggregation applied to this data yields the total bytes read; other aggregations allow you to export the minimum and maximum number of bytes read, as well as the average value, and quantile estimates. -In this case, the guidance is to create a single instrument. Do not -create a Counter instrument to export a sum when you want to export -other summary statistics using a Measure instrument. +In this case, the guidance is to create a single instrument. +Do not create a Counter instrument to export a sum when you want to export other summary statistics using a Measure instrument. ### Reporting system call duration -You wish to monitor the duration of a specific system call being made -frequently in your application, with a label to indicate a file name -associated with the operation. +You wish to monitor the duration of a specific system call being made frequently in your application, with a label to indicate a file name associated with the operation. -This is a typical application for the Measure instrument. Use a timer -to measure the duration of each call and `Record()` the measurement -with a label for the file name. +This is a typical application for the Measure instrument. +Use a timer to measure the duration of each call and `Record()` the measurement with a label for the file name. ### Reporting request size -You wish to monitor a trend in request sizes, which means you are -interested in characterizing individual events, as opposed to a sum. -Label these with relevant information that may help explain variance -in request sizes, such as the type of the request. +You wish to monitor a trend in request sizes, which means you are interested in characterizing individual events, as opposed to a sum. +Label these with relevant information that may help explain variance in request sizes, such as the type of the request. -This is a typical application for a Measure instrument. The standard -aggregation for Measure instruments will compute a measurement sum and -the event count, which determines the mean request size, as well as -the minimum and maximum sizes. +This is a typical application for a Measure instrument. +The standard aggregation for Measure instruments will compute a measurement sum and the event count, which determines the mean request size, as well as the minimum and maximum sizes. ### Reporting a per-request finishing account balance There's a number that rises and falls such as a bank account balance. -You wish to monitor the average account balance at the end of -requests, broken down by transaction type (e.g., withdrawal, deposit). +You wish to monitor the average account balance at the end of requests, broken down by transaction type (e.g., withdrawal, deposit). -Use a Measure instrument to report the current account balance at the -end of each request. Use a label for the transaction type. +Use a Measure instrument to report the current account balance at the end of each request. +Use a label for the transaction type. ### Reporting process-wide CPU usage -You are interested in reporting the CPU usage of the process as a -whole, which is computed via a (relatively expensive) system call -which returns two values, process-lifetime user and system -cpu-seconds. It is not necessary to update this measurement -frequently, because it is meant to be used only for accounting -purposes. +You are interested in reporting the CPU usage of the process as a whole, which is computed via a (relatively expensive) system call which returns two values, process-lifetime user and system cpu-seconds. +It is not necessary to update this measurement frequently, because it is meant to be used only for accounting purposes. -A single Observer instrument is recommended for this case, with a -label value to distinguish user from system CPU time. The Observer -callback will be called once per collection interval, which lowers the -cost of collecting this information. +A single Observer instrument is recommended for this case, with a label value to distinguish user from system CPU time. +The Observer callback will be called once per collection interval, which lowers the cost of collecting this information. -CPU usage is something that we naturally sum, which raises several -questions. +CPU usage is something that we naturally sum, which raises several questions. -- Why not use a Counter instrument? In order to use a Counter instrument, we would need to convert total usage figures into deltas. Calculating deltas from the previous measurement is easy to do, but Counter instruments are not meant to be used from callbacks. -- Why not report deltas in the Observer callback? Observer instruments are meant to be used to observe current values. Nothing prevents reporting deltas with an Observer, but the standard aggregation for Observer instruments is to sum the current value across distinct labels. The standard behavior is useful for determining the current rate of CPU usage, but special configuration would be required for an Observer instrument to use Counter aggregation. +- Why not use a Counter instrument. + In order to use a Counter instrument, we would need to convert total usage figures into deltas. + Calculating deltas from the previous measurement is easy to do, but Counter instruments are not meant to be used from callbacks. +- Why not report deltas in the Observer callback. + Observer instruments are meant to be used to observe current values. + Nothing prevents reporting deltas with an Observer, but the standard aggregation for Observer instruments is to sum the current value across distinct labels. + The standard behavior is useful for determining the current rate of CPU usage, but special configuration would be required for an Observer instrument to use Counter aggregation. ### Reporting per-shard memory holdings -Suppose you have a widely-used library that acts as a client to a -sharded service. For each shard it maintains some client-side state, -holding a variable amount of memory per shard. +Suppose you have a widely-used library that acts as a client to a sharded service. +For each shard it maintains some client-side state, holding a variable amount of memory per shard. -Observe the current allocation per shard using an Observer instrument with a -shard label. These can be aggregated across hosts to compute cluster-wide -memory holdings by shard, for example, using the standard aggregation for -Observers, which sums the current value across distinct labels. +Observe the current allocation per shard using an Observer instrument with a shard label. +These can be aggregated across hosts to compute cluster-wide memory holdings by shard, for example, using the standard aggregation for Observers, which sums the current value across distinct labels. ### Reporting number of active requests -Suppose your server maintains the count of active requests, which -rises and falls as new requests begin and end processing. +Suppose your server maintains the count of active requests, which rises and falls as new requests begin and end processing. -Observe the number of active requests periodically with an Observer -instrument. Labels can be used to indicate which application-specific -properties are associated with these events. +Observe the number of active requests periodically with an Observer instrument. +Labels can be used to indicate which application-specific properties are associated with these events. ### Reporting bytes read and written correlated by end user -An application uses storage servers to read and write from some -underlying media. These requests are made in the context of the end -user that made the request into the frontend system, with Correlation -Context passed from the frontend to the storage servers carrying these -properties. +An application uses storage servers to read and write from some underlying media. +These requests are made in the context of the end user that made the request into the frontend system, with Correlation Context passed from the frontend to the storage servers carrying these properties. -Use Counter instruments to report the number of bytes read and written -by the storage server. Configure the SDK to use a Correltion Context -label key (e.g., named "app.user") to aggregate events by all metric -instruments. +Use Counter instruments to report the number of bytes read and written by the storage server. +Configure the SDK to use a Correltion Context label key (e.g., named "app.user") to aggregate events by all metric instruments. diff --git a/specification/overview.md b/specification/overview.md index 05a90cc2f70..b86030fcf57 100644 --- a/specification/overview.md +++ b/specification/overview.md @@ -2,20 +2,14 @@ ## Distributed Tracing -A distributed trace is a set of events, triggered as a result of a single -logical operation, consolidated across various components of an application. A -distributed trace contains events that cross process, network and security -boundaries. A distributed trace may be initiated when someone presses a button -to start an action on a website - in this example, the trace will represent -calls made between the downstream services that handled the chain of requests -initiated by this button being pressed. +A distributed trace is a set of events, triggered as a result of a single logical operation, consolidated across various components of an application. +A distributed trace contains events that cross process, network and security boundaries. +A distributed trace may be initiated when someone presses a button to start an action on a website - in this example, the trace will represent calls made between the downstream services that handled the chain of requests initiated by this button being pressed. ### Trace -**Traces** in OpenTelemetry are defined implicitly by their **Spans**. In -particular, a **Trace** can be thought of as a directed acyclic graph (DAG) of -**Spans**, where the edges between **Spans** are defined as parent/child -relationship. +**Traces** in OpenTelemetry are defined implicitly by their **Spans**. +In particular, a **Trace** can be thought of as a directed acyclic graph (DAG) of **Spans**, where the edges between **Spans** are defined as parent/child relationship. For example, the following is an example **Trace** made up of 6 **Spans**: @@ -54,230 +48,178 @@ Each **Span** encapsulates the following state: - An operation name - A start and finish timestamp -- A set of zero or more key:value **Attributes**. The keys must be strings. The - values may be strings, bools, or numeric types. -- A set of zero or more **Events**, each of which is itself a key:value map - paired with a timestamp. The keys must be strings, though the values may be of - the same types as Span **Attributes**. +- A set of zero or more key:value **Attributes**. + The keys must be strings. + The values may be strings, bools, or numeric types. +- A set of zero or more **Events**, each of which is itself a key:value map paired with a timestamp. + The keys must be strings, though the values may be of the same types as Span **Attributes**. - Parent's **Span** identifier. -- [**Links**](#links-between-spans) to zero or more causally-related **Spans** - (via the **SpanContext** of those related **Spans**). +- [**Links**](#links-between-spans) to zero or more causally-related **Spans** (via the **SpanContext** of those related **Spans**). - **SpanContext** identification of a Span. See below. ### SpanContext -Represents all the information that identifies **Span** in the **Trace** and -MUST be propagated to child Spans and across process boundaries. A -**SpanContext** contains the tracing identifiers and the options that are -propagated from parent to child **Spans**. - -- **TraceId** is the identifier for a trace. It is worldwide unique with - practically sufficient probability by being made as 16 randomly generated - bytes. TraceId is used to group all spans for a specific trace together across - all processes. -- **SpanId** is the identifier for a span. It is globally unique with - practically sufficient probability by being made as 8 randomly generated - bytes. When passed to a child Span this identifier becomes the parent span id +Represents all the information that identifies **Span** in the **Trace** and MUST be propagated to child Spans and across process boundaries. +A **SpanContext** contains the tracing identifiers and the options that are propagated from parent to child **Spans**. + +- **TraceId** is the identifier for a trace. + It is worldwide unique with practically sufficient probability by being made as 16 randomly generated bytes. + TraceId is used to group all spans for a specific trace together across all processes. +- **SpanId** is the identifier for a span. + It is globally unique with practically sufficient probability by being made as 8 randomly generated bytes. + When passed to a child Span this identifier becomes the parent span id for the child **Span**. -- **TraceFlags** represents the options for a trace. It is represented as 1 - byte (bitmap). - - Sampling bit - Bit to represent whether trace is sampled or not (mask - `0x1`). -- **Tracestate** carries tracing-system specific context in a list of key value - pairs. **Tracestate** allows different vendors propagate additional - information and inter-operate with their legacy Id formats. For more details - see [this](https://w3c.github.io/trace-context/#tracestate-field). +- **TraceFlags** represents the options for a trace. It is represented as 1 byte (bitmap). + - Sampling bit - Bit to represent whether trace is sampled or not (mask `0x1`). +- **Tracestate** carries tracing-system specific context in a list of key value pairs. + **Tracestate** allows different vendors propagate additional information and inter-operate with their legacy Id formats. + For more details see [this](https://w3c.github.io/trace-context/#tracestate-field). ### Links between spans -A **Span** may be linked to zero or more other **Spans** (defined by -**SpanContext**) that are causally related. **Links** can point to -**SpanContexts** inside a single **Trace** or across different **Traces**. -**Links** can be used to represent batched operations where a **Span** was -initiated by multiple initiating **Spans**, each representing a single incoming -item being processed in the batch. - -Another example of using a **Link** is to declare the relationship between -the originating and following trace. This can be used when a **Trace** enters trusted -boundaries of a service and service policy requires the generation of a new -Trace rather than trusting the incoming Trace context. The new linked Trace may -also represent a long running asynchronous data processing operation that was -initiated by one of many fast incoming requests. - -When using the scatter/gather (also called fork/join) pattern, the root -operation starts multiple downstream processing operations and all of them are -aggregated back in a single **Span**. This last **Span** is linked to many -operations it aggregates. All of them are the **Spans** from the same Trace. And -similar to the Parent field of a **Span**. It is recommended, however, to not -set parent of the **Span** in this scenario as semantically the parent field -represents a single parent scenario, in many cases the parent **Span** fully -encloses the child **Span**. This is not the case in scatter/gather and batch -scenarios. +A **Span** may be linked to zero or more other **Spans** (defined by **SpanContext**) that are causally related. +**Links** can point to **SpanContexts** inside a single **Trace** or across different **Traces**. +**Links** can be used to represent batched operations where a **Span** was initiated by multiple initiating **Spans**, each representing a single incoming item being processed in the batch. + +Another example of using a **Link** is to declare the relationship between the originating and following trace. +This can be used when a **Trace** enters trusted boundaries of a service and service policy requires the generation of a new Trace rather than trusting the incoming Trace context. +The new linked Trace may also represent a long running asynchronous data processing operation that was initiated by one of many fast incoming requests. + +When using the scatter/gather (also called fork/join) pattern, the root operation starts multiple downstream processing operations and all of them are aggregated back in a single **Span**. +This last **Span** is linked to many operations it aggregates. +All of them are the **Spans** from the same Trace. +And similar to the Parent field of a **Span**. +It is recommended, however, to not set parent of the **Span** in this scenario as semantically the parent field represents a single parent scenario, in many cases the parent **Span** fully encloses the child **Span**. +This is not the case in scatter/gather and batch scenarios. ## Metrics -OpenTelemetry allows to record raw measurements or metrics with predefined -aggregation and set of labels. +OpenTelemetry allows to record raw measurements or metrics with predefined aggregation and set of labels. -Recording raw measurements using OpenTelemetry API allows to defer to end-user -the decision on what aggregation algorithm should be applied for this metric as -well as defining labels (dimensions). It will be used in client libraries like -gRPC to record raw measurements "server_latency" or "received_bytes". So end -user will decide what type of aggregated values should be collected out of these -raw measurements. It may be simple average or elaborate histogram calculation. +Recording raw measurements using OpenTelemetry API allows to defer to end-user the decision on what aggregation algorithm should be applied for this metric as well as defining labels (dimensions). +It will be used in client libraries like gRPC to record raw measurements "server_latency" or "received_bytes". +So end user will decide what type of aggregated values should be collected out of these raw measurements. +It may be simple average or elaborate histogram calculation. -Recording of metrics with the pre-defined aggregation using OpenTelemetry API is -not less important. It allows to collect values like cpu and memory usage, or -simple metrics like "queue length". +Recording of metrics with the pre-defined aggregation using OpenTelemetry API is not less important. +It allows to collect values like cpu and memory usage, or simple metrics like "queue length". ### Recording raw measurements -The main classes used to record raw measurements are `Measure` and -`Measurement`. List of `Measurement`s alongside the additional context can be -recorded using OpenTelemetry API. So user may define to aggregate those -`Measurement`s and use the context passed alongside to define additional -dimensions of the resulting metric. +The main classes used to record raw measurements are `Measure` and `Measurement`. +List of `Measurement`s alongside the additional context can be recorded using OpenTelemetry API. +So user may define to aggregate those `Measurement`s and use the context passed alongside to define additional dimensions of the resulting metric. #### Measure -`Measure` describes the type of the individual values recorded by a library. It -defines a contract between the library exposing the measurements and an -application that will aggregate those individual measurements into a `Metric`. +`Measure` describes the type of the individual values recorded by a library. +It defines a contract between the library exposing the measurements and an application that will aggregate those individual measurements into a `Metric`. `Measure` is identified by name, description and a unit of values. #### Measurement `Measurement` describes a single value to be collected for a `Measure`. -`Measurement` is an empty interface in API surface. This interface is defined in -SDK. +`Measurement` is an empty interface in API surface. +This interface is defined in SDK. ### Recording metrics with predefined aggregation -The base class for all types of pre-aggregated metrics is called `Metric`. It -defines basic metric properties like a name and labels. Classes inheriting from -the `Metric` define their aggregation type as well as a structure of individual -measurements or Points. API defines the following types of pre-aggregated -metrics: +The base class for all types of pre-aggregated metrics is called `Metric`. +It defines basic metric properties like a name and labels. +Classes inheriting from the `Metric` define their aggregation type as well as a structure of individual measurements or Points. +API defines the following types of pre-aggregated metrics: -- Counter metric to report instantaneous measurement. Counter values can go - up or stay the same, but can never go down. Counter values cannot be - negative. There are two types of counter metric values - `double` and `long`. -- Gauge metric to report instantaneous measurement of a numeric value. Gauges can - go both up and down. The gauges values can be negative. There are two types of - gauge metric values - `double` and `long`. +- Counter metric to report instantaneous measurement. + Counter values can go up or stay the same, but can never go down. + Counter values cannot be negative. + There are two types of counter metric values - `double` and `long`. +- Gauge metric to report instantaneous measurement of a numeric value. + Gauges can go both up and down. + The gauges values can be negative. + There are two types of gauge metric values - `double` and `long`. -API allows to construct the `Metric` of a chosen type. SDK defines the way to -query the current value of a `Metric` to be exported. +API allows to construct the `Metric` of a chosen type. +SDK defines the way to query the current value of a `Metric` to be exported. -Every type of a `Metric` has it's API to record values to be aggregated. API -supports both - push and pull model of setting the `Metric` value. +Every type of a `Metric` has it's API to record values to be aggregated. +API supports both - push and pull model of setting the `Metric` value. ### Metrics data model and SDK -Metrics data model is defined in SDK and is based on -[metrics.proto](https://github.com/open-telemetry/opentelemetry-proto/blob/master/opentelemetry/proto/metrics/v1/metrics.proto). +Metrics data model is defined in SDK and is based on [metrics.proto](https://github.com/open-telemetry/opentelemetry-proto/blob/master/opentelemetry/proto/metrics/v1/metrics.proto). This data model is used by all the OpenTelemetry exporters as an input. -Different exporters have different capabilities (e.g. which data types are -supported) and different constraints (e.g. which characters are allowed in label -keys). Metrics is intended to be a superset of what's possible, not a lowest -common denominator that's supported everywhere. All exporters consume data from -Metrics Data Model via a Metric Producer interface defined in OpenTelemetry SDK. - -Because of this, Metrics puts minimal constraints on the data (e.g. which -characters are allowed in keys), and code dealing with Metrics should avoid -validation and sanitization of the Metrics data. Instead, pass the data to the -backend, rely on the backend to perform validation, and pass back any errors -from the backend. +Different exporters have different capabilities (e.g. which data types are supported) and different constraints (e.g. which characters are allowed in label keys). +Metrics is intended to be a superset of what's possible, not a lowest common denominator that's supported everywhere. +All exporters consume data from Metrics Data Model via a Metric Producer interface defined in OpenTelemetry SDK. + +Because of this, Metrics puts minimal constraints on the data (e.g. which characters are allowed in keys), and code dealing with Metrics should avoid validation and sanitization of the Metrics data. +Instead, pass the data to the backend, rely on the backend to perform validation, and pass back any errors from the backend. ## CorrelationContext -In addition to trace propagation, OpenTelemetry provides a simple mechanism for propagating -name/value pairs, called `CorrelationContext`. `CorrelationContext` is intended for -indexing observability events in one service with attributes provided by a prior service in -the same transaction. This helps to establish a causal relationship between these events. +In addition to trace propagation, OpenTelemetry provides a simple mechanism for propagating name/value pairs, called `CorrelationContext`. +`CorrelationContext` is intended for indexing observability events in one service with attributes provided by a prior service in the same transaction. +This helps to establish a causal relationship between these events. The `CorrelationContext` implements the editor's draft of the [W3C Correlation-Context specification](https://w3c.github.io/correlation-context/). -While `CorrelationContext` can be used to prototype other cross-cutting concerns, this mechanism is primarily intended -to convey values for the OpenTelemetry observability systems. +While `CorrelationContext` can be used to prototype other cross-cutting concerns, this mechanism is primarily intended to convey values for the OpenTelemetry observability systems. -These values can be consumed from `CorrelationContext` and used as additional dimensions for metrics, -or additional context for logs and traces. Some examples: +These values can be consumed from `CorrelationContext` and used as additional dimensions for metrics, or additional context for logs and traces. +Some examples: - a web service can benefit from including context around what service has sent the request - a SaaS provider can include context about the API user or token that is responsible for that request - determining that a particular browser version is associated with a failure in an image processing service -For backward compatibility with OpenTracing, Baggage is propagated as `CorrelationContext` when -using the OpenTracing bridge. New concerns with different criteria should consider creating a new -cross-cutting concern to cover their use-case; they may benefit from the W3C encoding format but -use a new HTTP header to convey data throughout a distributed trace. +For backward compatibility with OpenTracing, Baggage is propagated as `CorrelationContext` when using the OpenTracing bridge. +New concerns with different criteria should consider creating a new cross-cutting concern to cover their use-case; they may benefit from the W3C encoding format but use a new HTTP header to convey data throughout a distributed trace. ## Resources -`Resource` captures information about the entity for which telemetry is -recorded. For example, metrics exposed by a Kubernetes container can be linked -to a resource that specifies the cluster, namespace, pod, and container name. +`Resource` captures information about the entity for which telemetry is recorded. +For example, metrics exposed by a Kubernetes container can be linked to a resource that specifies the cluster, namespace, pod, and container name. -`Resource` may capture an entire hierarchy of entity identification. It may -describe the host in the cloud and specific container or an application running -in the process. +`Resource` may capture an entire hierarchy of entity identification. +It may describe the host in the cloud and specific container or an application running in the process. -Note, that some of the process identification information can be associated with -telemetry automatically by OpenTelemetry SDK or specific exporter. See -OpenTelemetry -[proto](https://github.com/open-telemetry/opentelemetry-proto/blob/a46c815aa5e85a52deb6cb35b8bc182fb3ca86a0/src/opentelemetry/proto/agent/common/v1/common.proto#L28-L96) -for an example. +Note, that some of the process identification information can be associated with telemetry automatically by OpenTelemetry SDK or specific exporter. +See OpenTelemetry [proto](https://github.com/open-telemetry/opentelemetry-proto/blob/a46c815aa5e85a52deb6cb35b8bc182fb3ca86a0/src/opentelemetry/proto/agent/common/v1/common.proto#L28-L96) for an example. ## Context Propagation -All of OpenTelemetry cross-cutting concerns, such as traces and metrics, -share an underlying `Context` mechanism for storing state and -accessing data across the lifespan of a distributed transaction. +All of OpenTelemetry cross-cutting concerns, such as traces and metrics, share an underlying `Context` mechanism for storing state and accessing data across the lifespan of a distributed transaction. See the [Context](context/context.md) ## Propagators -OpenTelemetry uses `Propagators` to serialize and deserialize cross-cutting concern values -such as `SpanContext` and `CorrelationContext` into a `Format`. Currently there is one -type of propagator: +OpenTelemetry uses `Propagators` to serialize and deserialize cross-cutting concern values such as `SpanContext` and `CorrelationContext` into a `Format`. +Currently there is one type of propagator: -- `HTTPTextFormat` which is used to inject and extract a value as text into carriers that travel - in-band across process boundaries. +- `HTTPTextFormat` which is used to inject and extract a value as text into carriers that travel in-band across process boundaries. ## Collector -The OpenTelemetry collector is a set of components that can collect traces, -metrics and eventually other telemetry data (e.g. logs) from processes -instrumented by OpenTelementry or other monitoring/tracing libraries (Jaeger, -Prometheus, etc.), do aggregation and smart sampling, and export traces and -metrics to one or more monitoring/tracing backends. The collector will allow to -enrich and transform collected telemetry (e.g. add additional attributes or -scrub personal information). +The OpenTelemetry collector is a set of components that can collect traces, metrics and eventually other telemetry data (e.g. +logs) from processes instrumented by OpenTelementry or other monitoring/tracing libraries (Jaeger, Prometheus, etc.), do aggregation and smart sampling, and export traces and metrics to one or more monitoring/tracing backends. +The collector will allow to enrich and transform collected telemetry (e.g. +add additional attributes or scrub personal information). -The OpenTelemetry collector has two primary modes of operation: Agent (a daemon -running locally with the application) and Collector (a standalone running -service). +The OpenTelemetry collector has two primary modes of operation: Agent (a daemon running locally with the application) and Collector (a standalone running service). -Read more at OpenTelemetry Service [Long-term -Vision](https://github.com/open-telemetry/opentelemetry-collector/blob/master/docs/vision.md). +Read more at OpenTelemetry Service [Long-term Vision](https://github.com/open-telemetry/opentelemetry-collector/blob/master/docs/vision.md). ## Instrumentation adapters -The inspiration of the project is to make every library and application -manageable out of the box by instrumenting it with OpenTelemetry. However on the -way to this goal there will be a need to enable instrumentation by plugging -instrumentation adapters into the library of choice. These adapters can be -wrapping library APIs, subscribing to the library-specific callbacks or -translating telemetry exposed in other formats into OpenTelemetry model. - -Instrumentation adapters may be called different names. It is often referred as -plugin, collector or auto-collector, telemetry module, bridge, etc. It is always -recommended to follow the library and language standards. For instance, if -instrumentation adapter is implemented as "log appender" - it will probably be -called an `appender`, not an instrumentation adapter. However if there is no -established name - the recommendation is to call packages "Instrumentation -Adapter" or simply "Adapter". +The inspiration of the project is to make every library and application manageable out of the box by instrumenting it with OpenTelemetry. +However on the way to this goal there will be a need to enable instrumentation by plugging instrumentation adapters into the library of choice. +These adapters can be wrapping library APIs, subscribing to the library-specific callbacks or translating telemetry exposed in other formats into OpenTelemetry model. + +Instrumentation adapters may be called different names. +It is often referred as plugin, collector or auto-collector, telemetry module, bridge, etc. +It is always recommended to follow the library and language standards. +For instance, if instrumentation adapter is implemented as "log appender" - it will probably be called an `appender`, not an instrumentation adapter. +However if there is no established name - the recommendation is to call packages "Instrumentation Adapter" or simply "Adapter". ## Code injecting adapters @@ -285,14 +227,12 @@ TODO: fill out as a result of SIG discussion. ## Semantic Conventions -OpenTelemetry defines standard names and values of Resource attributes and -Span attributes. +OpenTelemetry defines standard names and values of Resource attributes and Span attributes. * [Resource Conventions](resource/semantic_conventions/README.md) * [Span Conventions](trace/semantic_conventions/README.md) * [Metrics Conventions](metrics/semantic_conventions/README.md) -The type of the attribute SHOULD be specified in the semantic convention -for that attribute. Array values are allowed for attributes. For -protocols that do not natively support array values such values MUST be -represented as JSON strings. +The type of the attribute SHOULD be specified in the semantic convention for that attribute. +Array values are allowed for attributes. +For protocols that do not natively support array values such values MUST be represented as JSON strings. diff --git a/specification/protocol/README.md b/specification/protocol/README.md index bf2560c63fe..2b2b1969bc1 100644 --- a/specification/protocol/README.md +++ b/specification/protocol/README.md @@ -1,6 +1,7 @@ # OpenTelemetry Protocol -This is the specification of new OpenTelemetry protocol. This is work in progress. +This is the specification of new OpenTelemetry protocol. +This is work in progress. - [Design Goals](design-goals.md). - [Requirements](requirements.md). diff --git a/specification/protocol/design-goals.md b/specification/protocol/design-goals.md index e51a4d70edf..5444963c23c 100644 --- a/specification/protocol/design-goals.md +++ b/specification/protocol/design-goals.md @@ -10,7 +10,8 @@ We want to design a telemetry data exchange protocol that has the following char - Impose minimal pressure on memory manager, including pass-through scenarios, where deserialized data is short-lived and must be serialized as-is shortly after and where such short-lived data is created and discarded at high frequency (think telemetry data forwarders). -- Support ability to efficiently modify deserialized data and serialize again to pass further. This is related but slightly different from the previous requirement. +- Support ability to efficiently modify deserialized data and serialize again to pass further. + This is related but slightly different from the previous requirement. - Ensure high throughput (within the available bandwidth) in high latency networks (e.g. scenarios where telemetry source and the backend are separated by high latency network). diff --git a/specification/protocol/requirements.md b/specification/protocol/requirements.md index 5d1f1cb6695..6d5505e80f6 100644 --- a/specification/protocol/requirements.md +++ b/specification/protocol/requirements.md @@ -8,7 +8,8 @@ See the goals of OpenTelemetry Protocol design [here](design-goals.md). ## Vocabulary -There are 2 parties involved in telemetry data exchange. In this document the party that is the source of telemetry data is called the Client, the party that is the destination of telemetry data is called the Server. +There are 2 parties involved in telemetry data exchange. +In this document the party that is the source of telemetry data is called the Client, the party that is the destination of telemetry data is called the Server. Examples of a Client are instrumented applications or sending side of telemetry collectors, examples of Servers are telemetry backends or receiving side of telemetry collectors (so a Collector is typically both a Client and a Server depending on which side you look from). @@ -38,27 +39,38 @@ The protocol must support traces and metrics as data types. ### Reliability of Delivery -The protocol must ensure reliable data delivery and clear visibility when the data cannot be delivered. This should be achieved by sending data acknowledgements from the Server to the Client. +The protocol must ensure reliable data delivery and clear visibility when the data cannot be delivered. +This should be achieved by sending data acknowledgements from the Server to the Client. Note that acknowledgements alone are not sufficient to guarantee that: a) no data will be lost and b) no data will be duplicated. Acknowledgements can help to guarantee a) but not b). Guaranteeing both at the same is difficult. Because it is usually preferable for telemetry data to be duplicated than to lose it, we choose to guarantee that there are no data losses while potentially allowing duplicate data. -Duplicates can typically happen in edge cases (e.g. on reconnections, network interruptions, etc) when the client has no way of knowing if last sent data was delivered. In these cases the client will usually choose to re-send the data to guarantee the delivery which in turn may result in duplicate data on the server side. +Duplicates can typically happen in edge cases (e.g. +on reconnections, network interruptions, etc) when the client has no way of knowing if last sent data was delivered. +In these cases the client will usually choose to re-send the data to guarantee the delivery which in turn may result in duplicate data on the server side. -_To avoid having duplicates the client and the server could track sent and delivered items using uniquely identifying ids. The exact mechanism for tracking the ids and performing data de-duplication may be defined at the layer above the protocol layer and is outside the scope of this document._ +_To avoid having duplicates the client and the server could track sent and delivered items using uniquely identifying ids. +The exact mechanism for tracking the ids and performing data de-duplication may be defined at the layer above the protocol layer and is outside the scope of this document._ For this reason we have slightly relaxed requirements and consider duplicate data acceptable in rare cases. -Note: this protocol is concerned with reliability of delivery between one pair of client/server nodes and aims to ensure that no data is lost in-transit between the client and the server. Many telemetry collection systems have multiple nodes that the data must travel across until reaching the final destination (e.g. application -> agent -> collector -> backend). End-to-end delivery guarantees in such systems is outside of the scope for this document. The acknowledgements described in this protocol happen between a single client/server pair and do not span multiple nodes in multi-hop delivery paths. +Note: this protocol is concerned with reliability of delivery between one pair of client/server nodes and aims to ensure that no data is lost in-transit between the client and the server. +Many telemetry collection systems have multiple nodes that the data must travel across until reaching the final destination (e.g. +application -> agent -> collector -> backend). +End-to-end delivery guarantees in such systems is outside of the scope for this document. +The acknowledgements described in this protocol happen between a single client/server pair and do not span multiple nodes in multi-hop delivery paths. ### Throughput The protocol must ensure high throughput in high latency networks when the client and the server are not in the same data center. -This requirement may rule out half-duplex protocols. The throughput of half-duplex protocols is highly dependent on network roundtrip time and request size. To achieve good throughput request sizes may be too large to be practical. +This requirement may rule out half-duplex protocols. +The throughput of half-duplex protocols is highly dependent on network roundtrip time and request size. +To achieve good throughput request sizes may be too large to be practical. ### Compression -The protocol must achieve high compression ratios for telemetry data. The protocol design must consider batching of telemetry data and grouping of similar data (both can help to achieve better compression using common compression algorithms). +The protocol must achieve high compression ratios for telemetry data. +The protocol design must consider batching of telemetry data and grouping of similar data (both can help to achieve better compression using common compression algorithms). ### Encryption @@ -68,9 +80,14 @@ Industry standard encryption (e.g. TLS/HTTPS) must be supported. The protocol must allow backpressure signalling. -If the server is unable to keep up with the pace of data it receives from the client then it must be able to signal that fact to the client. The client may then throttle itself to avoid overwhelming the server. +If the server is unable to keep up with the pace of data it receives from the client then it must be able to signal that fact to the client. +The client may then throttle itself to avoid overwhelming the server. -If the underlying transport is a stream that has its own flow control mechanism then the backpressure could be applied by delaying the reading of data from the server’s endpoint which could then be signalled to the client via underlying flow-control. However this approach makes it difficult for the client to distinguish server overloading from network delays (due to e.g. network losses). Such distinction is important for [observability reasons](https://github.com/open-telemetry/opentelemetry-service/pull/188). Because of this it is required for the protocol to allow to explicitly and clearly signal backpressure from the server to the client without relying on implicit signalling using underlying flow-control mechanisms. +If the underlying transport is a stream that has its own flow control mechanism then the backpressure could be applied by delaying the reading of data from the server’s endpoint which could then be signalled to the client via underlying flow-control. +However this approach makes it difficult for the client to distinguish server overloading from network delays (due to e.g. +network losses). +Such distinction is important for [observability reasons](https://github.com/open-telemetry/opentelemetry-service/pull/188). +Because of this it is required for the protocol to allow to explicitly and clearly signal backpressure from the server to the client without relying on implicit signalling using underlying flow-control mechanisms. The backpressure signal should include a hint to the client about desirable reduced rate of data. @@ -78,7 +95,8 @@ The backpressure signal should include a hint to the client about desirable redu The protocol must have fast data serialization and deserialization characteristics. -Ideally it must also support very fast pass-through mode (when no modifications to the data are needed), fast “augmenting” or “tagging” of data and partial inspection of data (e.g. check for presence of specific tag). These requirements help to create fast Agents and Collectors. +Ideally it must also support very fast pass-through mode (when no modifications to the data are needed), fast “augmenting” or “tagging” of data and partial inspection of data (e.g. check for presence of specific tag). +These requirements help to create fast Agents and Collectors. ### Memory Usage Profile @@ -88,11 +106,13 @@ The implementation of telemetry protocol must aim to minimize the number of memo ### Level 7 Load Balancer Friendly -The protocol must allow Level 7 load balancers such as Envoy to re-balance the traffic for each batch of telemetry data. The traffic should not get pinned by a load balancer to one server for the entire duration of telemetry data sending, thus potentially leading to imbalanced load of servers located behind the load balancer. +The protocol must allow Level 7 load balancers such as Envoy to re-balance the traffic for each batch of telemetry data. +The traffic should not get pinned by a load balancer to one server for the entire duration of telemetry data sending, thus potentially leading to imbalanced load of servers located behind the load balancer. ### Backwards Compatibility -The protocol should be possible to evolve over time. It should be possible for nodes that implement different versions of OpenTelemetry protocol to interoperate (while possibly regressing to the lowest common denominator from functional perspective). +The protocol should be possible to evolve over time. +It should be possible for nodes that implement different versions of OpenTelemetry protocol to interoperate (while possibly regressing to the lowest common denominator from functional perspective). ### General Requirements diff --git a/specification/resource/sdk.md b/specification/resource/sdk.md index cf2fca9cf73..42af30e1e42 100644 --- a/specification/resource/sdk.md +++ b/specification/resource/sdk.md @@ -1,62 +1,48 @@ # Resource SDK -A [Resource](../overview.md#resources) is an immutable representation of the entity producing -telemetry. For example, a process producing telemetry that is running in a -container on Kubernetes has a Pod name, it is in a namespace and possibly is -part of a Deployment which also has a name. All three of these attributes can be -included in the `Resource`. - -The primary purpose of resources as a first-class concept in the SDK is -decoupling of discovery of resource information from exporters. This allows for -independent development and easy customization for users that need to integrate -with closed source environments. The SDK MUST allow for creation of `Resources` and -for associating them with telemetry. - -When used with distributed tracing, a resource can be associated with the -[TracerProvider](../trace/sdk.md#tracer-sdk) when it is created. +A [Resource](../overview.md#resources) is an immutable representation of the entity producing telemetry. +For example, a process producing telemetry that is running in a container on Kubernetes has a Pod name, it is in a namespace and possibly is part of a Deployment which also has a name. +All three of these attributes can be included in the `Resource`. + +The primary purpose of resources as a first-class concept in the SDK is decoupling of discovery of resource information from exporters. +This allows for independent development and easy customization for users that need to integrate with closed source environments. +The SDK MUST allow for creation of `Resources` and for associating them with telemetry. + +When used with distributed tracing, a resource can be associated with the [TracerProvider](../trace/sdk.md#tracer-sdk) when it is created. That association cannot be changed later. -When associated with a `TracerProvider`, -all `Span`s produced by any `Tracer` from the provider MUST be associated with this `Resource`. +When associated with a `TracerProvider`, all `Span`s produced by any `Tracer` from the provider MUST be associated with this `Resource`. -Analogous to distributed tracing, when used with metrics, -a resource can be associated with a `MeterProvider`. -When associated with a [`MeterProvider`](../metrics/api-user.md#obtaining-a-meter), -all `Metrics` produced by any `Meter` from the provider will be -associated with this `Resource`. +Analogous to distributed tracing, when used with metrics, a resource can be associated with a `MeterProvider`. +When associated with a [`MeterProvider`](../metrics/api-user.md#obtaining-a-meter), all `Metrics` produced by any `Meter` from the provider will be associated with this `Resource`. ## Resource creation -The SDK must support two ways to instantiate new resources. Those are: +The SDK must support two ways to instantiate new resources. +Those are: ### Create -The interface MUST provide a way to create a new resource, from a collection of -attributes. Examples include a factory method or a constructor for a resource -object. A factory method is recommended to enable support for cached objects. +The interface MUST provide a way to create a new resource, from a collection of attributes. +Examples include a factory method or a constructor for a resource object. +A factory method is recommended to enable support for cached objects. Required parameters: -- a collection of name/value attributes, where name is a string and value can be one - of: string, int64, double, bool. +- a collection of name/value attributes, where name is a string and value can be one of: string, int64, double, bool. ### Merge -The interface MUST provide a way for a primary resource and a -secondary resource to be merged into a new resource. +The interface MUST provide a way for a primary resource and a secondary resource to be merged into a new resource. -Note: This is intended to be utilized for merging of resources whose attributes -come from different sources, -such as environment variables, or metadata extracted from the host or container. +Note: This is intended to be utilized for merging of resources whose attributes come from different sources, such as environment variables, or metadata extracted from the host or container. The resulting resource MUST have all attributes that are on any of the two input resources. -Conflicts (i.e. a key for which attributes exist on both the primary and secondary resource) -MUST be handled as follows: +Conflicts (i.e. a key for which attributes exist on both the primary and secondary resource) MUST be handled as follows: * If the value on the primary resource is an empty string, the result has the value of the secondary resource. * Otherwise, the value of the primary resource is used. -Attribute key namespacing SHOULD be used to prevent collisions across different -resource detection steps. +Attribute key namespacing SHOULD be used to prevent collisions across different resource detection steps. Required parameters: @@ -65,26 +51,21 @@ Required parameters: ### The empty resource -It is recommended, but not required, to provide a way to quickly create an empty -resource. +It is recommended, but not required, to provide a way to quickly create an empty resource. -Note that the OpenTelemetry project documents certain ["standard -attributes"](semantic_conventions/README.md) that have prescribed semantic meanings. +Note that the OpenTelemetry project documents certain ["standard attributes"](semantic_conventions/README.md) that have prescribed semantic meanings. ## Resource operations -Resources are immutable. Thus, in addition to resource creation, -only the following operations should be provided: +Resources are immutable. +Thus, in addition to resource creation, only the following operations should be provided: ### Retrieve attributes -The SDK should provide a way to retrieve a read only collection of attributes -associated with a resource. The attributes should consist of the name and values, -both of which should be strings. +The SDK should provide a way to retrieve a read only collection of attributes associated with a resource. +The attributes should consist of the name and values, both of which should be strings. There is no need to guarantee the order of the attributes. -The most common operation when retrieving attributes is to enumerate over them. As -such, it is recommended to optimize the resulting collection for fast -enumeration over other considerations such as a way to quickly retrieve a value -for a attribute with a specific key. +The most common operation when retrieving attributes is to enumerate over them. +As such, it is recommended to optimize the resulting collection for fast enumeration over other considerations such as a way to quickly retrieve a value for a attribute with a specific key. diff --git a/specification/resource/semantic_conventions/README.md b/specification/resource/semantic_conventions/README.md index 0d23cd0cbd5..1d7a39c609c 100644 --- a/specification/resource/semantic_conventions/README.md +++ b/specification/resource/semantic_conventions/README.md @@ -1,6 +1,8 @@ # Resource Semantic Conventions -This document defines standard attributes for resources. These attributes are typically used in the [Resource](../sdk.md) and are also recommended to be used anywhere else where there is a need to describe a resource in a consistent manner. The majority of these attributes are inherited from +This document defines standard attributes for resources. +These attributes are typically used in the [Resource](../sdk.md) and are also recommended to be used anywhere else where there is a need to describe a resource in a consistent manner. +The majority of these attributes are inherited from [OpenCensus Resource standard](https://github.com/census-instrumentation/opencensus-specs/blob/master/resource/StandardResources.md). - [Service](#service) @@ -27,9 +29,14 @@ This document defines standard attributes for resources. These attributes are ty ### Document Conventions -Attributes are grouped logically by the type of the concept that they described. Attributes in the same group have a common prefix that ends with a dot. For example all attributes that describe Kubernetes properties start with "k8s." +Attributes are grouped logically by the type of the concept that they described. +Attributes in the same group have a common prefix that ends with a dot. +For example all attributes that describe Kubernetes properties start with "k8s." -Certain attribute groups in this document have a **Required** column. For these groups if any attribute from the particular group is present in the Resource then all attributes that are marked as Required MUST be also present in the Resource. However it is also valid if the entire attribute group is omitted (i.e. none of the attributes from the particular group are present even though some of them are marked as Required in this document). +Certain attribute groups in this document have a **Required** column. +For these groups if any attribute from the particular group is present in the Resource then all attributes that are marked as Required MUST be also present in the Resource. +However it is also valid if the entire attribute group is omitted (i.e. +none of the attributes from the particular group are present even though some of them are marked as Required in this document). ## Service @@ -64,12 +71,9 @@ service.name = Shop.shoppingcart **Description:** The telemetry SDK used to capture data recorded by the instrumentation libraries. -The default OpenTelemetry SDK provided by the OpenTelemetry project MUST set `telemetry.sdk.name` -to the value `opentelemetry`. +The default OpenTelemetry SDK provided by the OpenTelemetry project MUST set `telemetry.sdk.name` to the value `opentelemetry`. -If another SDK, like a fork or a vendor-provided implementation, is used, this SDK MUST set the attribute -`telemetry.sdk.name` to the fully-qualified class or module name of this SDK's main entry point -or another suitable identifier depending on the language. +If another SDK, like a fork or a vendor-provided implementation, is used, this SDK MUST set the attribute `telemetry.sdk.name` to the fully-qualified class or module name of this SDK's main entry point or another suitable identifier depending on the language. The identifier `opentelemetry` is reserved and MUST NOT be used in this case. The identifier SHOULD be stable across different versions of an implementation. @@ -169,8 +173,7 @@ Attributes defining a running environment (e.g. Cloud, Data Center). ## Version Attributes -Version attributes such as `service.version` and `library.version` are values of type `string`, -with naming schemas hinting at the type of a version, such as the following: +Version attributes such as `service.version` and `library.version` are values of type `string`, with naming schemas hinting at the type of a version, such as the following: - `semver:1.2.3` (a semantic version) - `git:8ae73a` (a git sha hash) diff --git a/specification/sdk-configuration.md b/specification/sdk-configuration.md index 2839d8225db..2e0cce3b6a8 100644 --- a/specification/sdk-configuration.md +++ b/specification/sdk-configuration.md @@ -10,10 +10,9 @@ ## Abstract -The default Open Telemetry SDK (hereafter referred to as "The SDK") -is highly configurable. This specification outlines the mechanisms by -which the SDK can be configured. It does -not attempt to specify the details of what can be configured. +The default Open Telemetry SDK (hereafter referred to as "The SDK") is highly configurable. +This specification outlines the mechanisms by which the SDK can be configured. +It does not attempt to specify the details of what can be configured. ## Configuration Interface @@ -30,6 +29,5 @@ consumable by the programatic interface. ### Other Mechanisms -Additional configuration mechanisms SHOULD be provided in whatever -language/format/style is idiomatic for the language of the SDK. The -SDK can include as many configuration mechanisms as appropriate. +Additional configuration mechanisms SHOULD be provided in whatever language/format/style is idiomatic for the language of the SDK. +The SDK can include as many configuration mechanisms as appropriate. diff --git a/specification/trace/api.md b/specification/trace/api.md index c55143973e0..b16ab713fb6 100644 --- a/specification/trace/api.md +++ b/specification/trace/api.md @@ -39,13 +39,12 @@ Table of Contents Tracing API consist of a few main classes: - `Tracer` is used for all operations. See [Tracer](#tracer) section. -- `Span` is a mutable object storing information about the current operation - execution. See [Span](#span) section. +- `Span` is a mutable object storing information about the current operation execution. + See [Span](#span) section. ## Data types -While languages and platforms have different ways of representing data, -this section defines some generic requirements for this API. +While languages and platforms have different ways of representing data, this section defines some generic requirements for this API. ### Time @@ -68,55 +67,35 @@ A duration is the elapsed time between two events. ## Tracer -The OpenTelemetry library achieves in-process context propagation of `Span`s by -way of the `Tracer`. +The OpenTelemetry library achieves in-process context propagation of `Span`s by way of the `Tracer`. -The `Tracer` is responsible for tracking the currently active `Span`, and -exposes functions for creating and activating new `Span`s. The `Tracer` is -configured with `Propagator`s which support transferring span context across -process boundaries. +The `Tracer` is responsible for tracking the currently active `Span`, and exposes functions for creating and activating new `Span`s. +The `Tracer` is configured with `Propagator`s which support transferring span context across process boundaries. ### Obtaining a Tracer -New `Tracer` instances can be created via a `TracerProvider` and its `getTracer` -function. This function expects two string arguments: - -`TracerProvider`s are generally expected to be used as singletons. Implementations -SHOULD provide a single global default `TracerProvider`. - -Some applications may use multiple `TracerProvider` instances, e.g. to provide -different settings (e.g. `SpanProcessor`s) to each of those instances and - -in further consequence - to the `Tracer` instances created by them. - -- `name` (required): This name must identify the instrumentation library (also - referred to as integration, e.g. `io.opentelemetry.contrib.mongodb`) and *not* - the instrumented library. - In case an invalid name (null or empty string) is specified, a working - default Tracer implementation as a fallback is returned rather than returning - null or throwing an exception. - A library, implementing the OpenTelemetry API *may* also ignore this name and - return a default instance for all calls, if it does not support "named" - functionality (e.g. an implementation which is not even observability-related). - A TracerProvider could also return a no-op Tracer here if application owners configure - the SDK to suppress telemetry produced by this library. -- `version` (optional): Specifies the version of the instrumentation library - (e.g. `semver:1.0.0`). - -Implementations might require the user to specify configuration properties at -`TracerProvider` creation time, or rely on external configuration, e.g. when using the -provider pattern. +New `Tracer` instances can be created via a `TracerProvider` and its `getTracer` function. +This function expects two string arguments: + +`TracerProvider`s are generally expected to be used as singletons. +Implementations SHOULD provide a single global default `TracerProvider`. + +Some applications may use multiple `TracerProvider` instances, e.g. to provide different settings (e.g. `SpanProcessor`s) to each of those instances and - in further consequence - to the `Tracer` instances created by them. + +- `name` (required): This name must identify the instrumentation library (also referred to as integration, e.g. `io.opentelemetry.contrib.mongodb`) and *not* the instrumented library. + In case an invalid name (null or empty string) is specified, a working default Tracer implementation as a fallback is returned rather than returning null or throwing an exception. + A library, implementing the OpenTelemetry API *may* also ignore this name and return a default instance for all calls, if it does not support "named" functionality (e.g. an implementation which is not even observability-related). + A TracerProvider could also return a no-op Tracer here if application owners configure the SDK to suppress telemetry produced by this library. +- `version` (optional): Specifies the version of the instrumentation library (e.g. `semver:1.0.0`). + +Implementations might require the user to specify configuration properties at `TracerProvider` creation time, or rely on external configuration, e.g. when using the provider pattern. #### Runtimes with multiple deployments/applications -Runtimes that support multiple deployments or applications might need to -provide a different `TracerProvider` instance to each deployment. To support this, -the global `TracerProvider` registry may delegate calls to create new instances of -`TracerProvider` to a separate `Provider` component, and the runtime may include -its own `Provider` implementation which returns a different `TracerProvider` for -each deployment. +Runtimes that support multiple deployments or applications might need to provide a different `TracerProvider` instance to each deployment. +To support this, the global `TracerProvider` registry may delegate calls to create new instances of `TracerProvider` to a separate `Provider` component, and the runtime may include its own `Provider` implementation which returns a different `TracerProvider` for each deployment. -`Provider` instances are registered with the API via some language-specific -mechanism, for instance the `ServiceLoader` class in Java. +`Provider` instances are registered with the API via some language-specific mechanism, for instance the `ServiceLoader` class in Java. ### Tracer operations @@ -129,75 +108,56 @@ The `Tracer` SHOULD provide methods to: - Get the currently active `Span` - Make a given `Span` as active -The `Tracer` MUST internally leverage the `Context` in order to get and set the -current `Span` state and how `Span`s are passed across process boundaries. +The `Tracer` MUST internally leverage the `Context` in order to get and set the current `Span` state and how `Span`s are passed across process boundaries. -When getting the current span, the `Tracer` MUST return a placeholder `Span` -with an invalid `SpanContext` if there is no currently active `Span`. +When getting the current span, the `Tracer` MUST return a placeholder `Span` with an invalid `SpanContext` if there is no currently active `Span`. -When creating a new `Span`, the `Tracer` MUST allow the caller to specify the -new `Span`'s parent in the form of a `Span` or `SpanContext`. The `Tracer` -SHOULD create each new `Span` as a child of its active `Span` unless an -explicit parent is provided or the option to create a span without a parent is -selected, or the current active `Span` is invalid. +When creating a new `Span`, the `Tracer` MUST allow the caller to specify the new `Span`'s parent in the form of a `Span` or `SpanContext`. +The `Tracer` SHOULD create each new `Span` as a child of its active `Span` unless an explicit parent is provided or the option to create a span without a parent is selected, or the current active `Span` is invalid. -The `Tracer` SHOULD provide a way to update its active `Span` and MAY provide -convenience functions to manage a `Span`'s lifetime and the scope in which a -`Span` is active. When an active `Span` is made inactive, the previously-active -`Span` SHOULD be made active. A `Span` maybe finished (i.e. have a non-null end -time) but still active. A `Span` may be active on one thread after it has been -made inactive on another. +The `Tracer` SHOULD provide a way to update its active `Span` and MAY provide convenience functions to manage a `Span`'s lifetime and the scope in which a `Span` is active. +When an active `Span` is made inactive, the previously-active `Span` SHOULD be made active. +A `Span` maybe finished (i.e. have a non-null end time) but still active. +A `Span` may be active on one thread after it has been made inactive on another. ## SpanContext -A `SpanContext` represents the portion of a `Span` which must be serialized and -propagated along side of a distributed context. `SpanContext`s are immutable. +A `SpanContext` represents the portion of a `Span` which must be serialized and propagated along side of a distributed context. +`SpanContext`s are immutable. `SpanContext` MUST be a final (sealed) class. -The OpenTelemetry `SpanContext` representation conforms to the [w3c TraceContext -specification](https://www.w3.org/TR/trace-context/). It contains two -identifiers - a `TraceId` and a `SpanId` - along with a set of common -`TraceFlags` and system-specific `TraceState` values. +The OpenTelemetry `SpanContext` representation conforms to the [w3c TraceContext specification](https://www.w3.org/TR/trace-context/). +It contains two identifiers - a `TraceId` and a `SpanId` - along with a set of common `TraceFlags` and system-specific `TraceState` values. -`TraceId` A valid trace identifier is a 16-byte array with at least one -non-zero byte. +`TraceId` A valid trace identifier is a 16-byte array with at least one non-zero byte. -`SpanId` A valid span identifier is an 8-byte array with at least one non-zero -byte. +`SpanId` A valid span identifier is an 8-byte array with at least one non-zero byte. -`TraceFlags` contain details about the trace. Unlike Tracestate values, -TraceFlags are present in all traces. Currently, the only `TraceFlags` is a -boolean `sampled` -[flag](https://www.w3.org/TR/trace-context/#trace-flags). +`TraceFlags` contain details about the trace. +Unlike Tracestate values, TraceFlags are present in all traces. +Currently, the only `TraceFlags` is a boolean `sampled` [flag](https://www.w3.org/TR/trace-context/#trace-flags). -`Tracestate` carries system-specific configuration data, represented as a list -of key-value pairs. TraceState allows multiple tracing systems to participate in -the same trace. +`Tracestate` carries system-specific configuration data, represented as a list of key-value pairs. +TraceState allows multiple tracing systems to participate in the same trace. -`IsValid` is a boolean flag which returns true if the SpanContext has a non-zero -TraceID and a non-zero SpanID. +`IsValid` is a boolean flag which returns true if the SpanContext has a non-zero TraceID and a non-zero SpanID. -`IsRemote` is a boolean flag which returns true if the SpanContext was propagated -from a remote parent. +`IsRemote` is a boolean flag which returns true if the SpanContext was propagated from a remote parent. When creating children from remote spans, their IsRemote flag MUST be set to false. -Please review the W3C specification for details on the [Tracestate -field](https://www.w3.org/TR/trace-context/#tracestate-field). +Please review the W3C specification for details on the [Tracestate field](https://www.w3.org/TR/trace-context/#tracestate-field). ## Span -A `Span` represents a single operation within a trace. Spans can be nested to -form a trace tree. Each trace contains a root span, which typically describes -the end-to-end latency and, optionally, one or more sub-spans for its -sub-operations. +A `Span` represents a single operation within a trace. +Spans can be nested to form a trace tree. +Each trace contains a root span, which typically describes the end-to-end latency and, optionally, one or more sub-spans for its sub-operations. `Span`s encapsulate: - The span name -- An immutable [`SpanContext`](#spancontext) that uniquely identifies the - `Span` -- A parent span in the form of a [`Span`](#span), [`SpanContext`](#spancontext), - or null +- An immutable [`SpanContext`](#spancontext) that uniquely identifies the `Span` +- A parent span in the form of a [`Span`](#span), [`SpanContext`](#spancontext), or null - A [`SpanKind`](#spankind) - A start timestamp - An end timestamp @@ -206,13 +166,9 @@ sub-operations. - A list of timestamped [`Event`s](#add-events) - A [`Status`](#set-status). -The _span name_ is a human-readable string which concisely identifies the work -represented by the Span, for example, an RPC method name, a function name, -or the name of a subtask or stage within a larger computation. The span name -should be the most general string that identifies a (statistically) interesting -_class of Spans_, rather than individual Span instances. That is, "get_user" is -a reasonable name, while "get_user/314159", where "314159" is a user ID, is not -a good name due to its high cardinality. +The _span name_ is a human-readable string which concisely identifies the work represented by the Span, for example, an RPC method name, a function name, or the name of a subtask or stage within a larger computation. +The span name should be the most general string that identifies a (statistically) interesting _class of Spans_, rather than individual Span instances. +That is, "get_user" is a reasonable name, while "get_user/314159", where "314159" is a user ID, is not a good name due to its high cardinality. For example, here are potential span names for an endpoint that gets a hypothetical account information: @@ -224,67 +180,56 @@ hypothetical account information: | `get_account` | Good, and account_id=42 would make a nice Span attribute | | `get_account/{accountId}` | Also good (using the "HTTP route") | -The `Span`'s start and end timestamps reflect the elapsed real time of the -operation. A `Span`'s start time SHOULD be set to the current time on [span -creation](#span-creation). After the `Span` is created, it SHOULD be possible to -change the its name, set its `Attribute`s, and add `Link`s and `Event`s. These -MUST NOT be changed after the `Span`'s end time has been set. +The `Span`'s start and end timestamps reflect the elapsed real time of the operation. +A `Span`'s start time SHOULD be set to the current time on [span creation](#span-creation). +After the `Span` is created, it SHOULD be possible to change the its name, set its `Attribute`s, and add `Link`s and `Event`s. +These MUST NOT be changed after the `Span`'s end time has been set. -`Span`s are not meant to be used to propagate information within a process. To -prevent misuse, implementations SHOULD NOT provide access to a `Span`'s -attributes besides its `SpanContext`. +`Span`s are not meant to be used to propagate information within a process. +To prevent misuse, implementations SHOULD NOT provide access to a `Span`'s attributes besides its `SpanContext`. Vendors may implement the `Span` interface to effect vendor-specific logic. -However, alternative implementations MUST NOT allow callers to create `Span`s -directly. All `Span`s MUST be created via a `Tracer`. +However, alternative implementations MUST NOT allow callers to create `Span`s directly. +All `Span`s MUST be created via a `Tracer`. ### Span Creation -Implementations MUST provide a way to create `Span`s via a `Tracer`. By default, -the currently active `Span` is set as the new `Span`'s parent. The `Tracer` -MAY provide other default options for newly created `Span`s. +Implementations MUST provide a way to create `Span`s via a `Tracer`. +By default, the currently active `Span` is set as the new `Span`'s parent. +The `Tracer` MAY provide other default options for newly created `Span`s. -`Span` creation MUST NOT set the newly created `Span` as the currently -active `Span` by default, but this functionality MAY be offered additionally -as a separate operation. +`Span` creation MUST NOT set the newly created `Span` as the currently active `Span` by default, but this functionality MAY be offered additionally as a separate operation. The API MUST accept the following parameters: - The span name. This is a required parameter. -- The parent `Span` or a `Context` containing a parent `Span` or `SpanContext`, - and whether the new `Span` should be a root `Span`. API MAY also have an - option for implicit parenting from the current context as a default behavior. - See [Determining the Parent Span from a Context](#determining-the-parent-span-from-a-context) - for guidance on `Span` parenting from explicit and implicit `Context`s. +- The parent `Span` or a `Context` containing a parent `Span` or `SpanContext`, and whether the new `Span` should be a root `Span`. + API MAY also have an option for implicit parenting from the current context as a default behavior. + See [Determining the Parent Span from a Context](#determining-the-parent-span-from-a-context) for guidance on `Span` parenting from explicit and implicit `Context`s. - [`SpanKind`](#spankind), default to `SpanKind.Internal` if not specified. -- `Attribute`s - A collection of key-value pairs, with the same semantics as - the ones settable with [Span::SetAttributes](#set-attributes). Additionally, - these attributes may be used to make a sampling decision as noted in [sampling - description](sdk.md#sampling). An empty collection will be assumed if - not specified. +- `Attribute`s - A collection of key-value pairs, with the same semantics as the ones settable with [Span::SetAttributes](#set-attributes). + Additionally, these attributes may be used to make a sampling decision as noted in [sampling description](sdk.md#sampling). + An empty collection will be assumed if not specified. Whenever possible, users SHOULD set any already known attributes at span creation instead of calling `SetAttribute` later. -- `Link`s - see API definition [here](#add-links). Empty list will be assumed if - not specified. -- `Start timestamp`, default to current time. This argument SHOULD only be set - when span creation time has already passed. If API is called at a moment of - a Span logical start, API user MUST not explicitly set this argument. - -Each span has zero or one parent span and zero or more child spans, which -represent causally related operations. A tree of related spans comprises a -trace. A span is said to be a _root span_ if it does not have a parent. Each -trace includes a single root span, which is the shared ancestor of all other -spans in the trace. Implementations MUST provide an option to create a `Span` as -a root span, and MUST generate a new `TraceId` for each root span created. +- `Link`s - see API definition [here](#add-links). + Empty list will be assumed if not specified. +- `Start timestamp`, default to current time. + This argument SHOULD only be set when span creation time has already passed. + If API is called at a moment of a Span logical start, API user MUST not explicitly set this argument. + +Each span has zero or one parent span and zero or more child spans, which represent causally related operations. +A tree of related spans comprises a trace. +A span is said to be a _root span_ if it does not have a parent. +Each trace includes a single root span, which is the shared ancestor of all other spans in the trace. +Implementations MUST provide an option to create a `Span` as a root span, and MUST generate a new `TraceId` for each root span created. For a Span with a parent, the `TraceId` MUST be the same as the parent. Also, the child span MUST inherit all `TraceState` values of its parent by default. -A `Span` is said to have a _remote parent_ if it is the child of a `Span` -created in another process. Each propagators' deserialization must set -`IsRemote` to true on a parent `SpanContext` so `Span` creation knows if the -parent is remote. +A `Span` is said to have a _remote parent_ if it is the child of a `Span` created in another process. +Each propagators' deserialization must set `IsRemote` to true on a parent `SpanContext` so `Span` creation knows if the parent is remote. #### Determining the Parent Span from a Context @@ -303,69 +248,56 @@ The parent should be selected in the following order of precedence: #### Add Links -During the `Span` creation user MUST have the ability to record links to other `Span`s. Linked -`Span`s can be from the same or a different trace. See [Links -description](../overview.md#links-between-spans). +During the `Span` creation user MUST have the ability to record links to other `Span`s. +Linked `Span`s can be from the same or a different trace. +See [Links description](../overview.md#links-between-spans). A `Link` is defined by the following properties: - (Required) `SpanContext` of the `Span` to link to. -- (Optional) One or more `Attribute`s with the same restrictions as defined for - [Span Attributes](#set-attributes). +- (Optional) One or more `Attribute`s with the same restrictions as defined for [Span Attributes](#set-attributes). The `Link` SHOULD be an immutable type. The Span creation API should provide: -- An API to record a single `Link` where the `Link` properties are passed as - arguments. This MAY be called `AddLink`. -- An API to record a single `Link` whose attributes or attribute values are - lazily constructed, with the intention of avoiding unnecessary work if a link - is unused. If the language supports overloads then this SHOULD be called - `AddLink` otherwise `AddLazyLink` MAY be considered. In some languages, it might - be easier to defer `Link` or attribute creation entirely by providing a wrapping - class or function that returns a `Link` or formatted attributes. When providing - a wrapping class or function it SHOULD be named `LinkFormatter`. +- An API to record a single `Link` where the `Link` properties are passed as arguments. + This MAY be called `AddLink`. +- An API to record a single `Link` whose attributes or attribute values are lazily constructed, with the intention of avoiding unnecessary work if a link is unused. + If the language supports overloads then this SHOULD be called `AddLink` otherwise `AddLazyLink` MAY be considered. + In some languages, it might be easier to defer `Link` or attribute creation entirely by providing a wrapping class or function that returns a `Link` or formatted attributes. + When providing a wrapping class or function it SHOULD be named `LinkFormatter`. Links SHOULD preserve the order in which they're set. ### Span operations -With the exception of the function to retrieve the `Span`'s `SpanContext` and -recording status, none of the below may be called after the `Span` is finished. +With the exception of the function to retrieve the `Span`'s `SpanContext` and recording status, none of the below may be called after the `Span` is finished. #### Get Context The Span interface MUST provide: -- An API that returns the `SpanContext` for the given `Span`. The returned value - may be used even after the `Span` is finished. The returned value MUST be the - same for the entire Span lifetime. This MAY be called `GetContext`. +- An API that returns the `SpanContext` for the given `Span`. + The returned value may be used even after the `Span` is finished. + The returned value MUST be the same for the entire Span lifetime. + This MAY be called `GetContext`. #### IsRecording -Returns true if this `Span` is recording information like events with the -`AddEvent` operation, attributes using `SetAttributes`, status with `SetStatus`, -etc. +Returns true if this `Span` is recording information like events with the `AddEvent` operation, attributes using `SetAttributes`, status with `SetStatus`, etc. There should be no parameter. -This flag SHOULD be used to avoid expensive computations of a Span attributes or -events in case when a Span is definitely not recorded. Note that any child -span's recording is determined independently from the value of this flag -(typically based on the `sampled` flag of a `TraceFlag` on -[SpanContext](#spancontext)). +This flag SHOULD be used to avoid expensive computations of a Span attributes or events in case when a Span is definitely not recorded. +Note that any child span's recording is determined independently from the value of this flag (typically based on the `sampled` flag of a `TraceFlag` on [SpanContext](#spancontext)). -This flag may be `true` despite the entire trace being sampled out. This -allows to record and process information about the individual Span without -sending it to the backend. An example of this scenario may be recording and -processing of all incoming requests for the processing and building of -SLA/SLO latency charts while sending only a subset - sampled spans - to the -backend. See also the [sampling section of SDK design](sdk.md#sampling). +This flag may be `true` despite the entire trace being sampled out. +This allows to record and process information about the individual Span without sending it to the backend. +An example of this scenario may be recording and processing of all incoming requests for the processing and building of SLA/SLO latency charts while sending only a subset - sampled spans - to the backend. +See also the [sampling section of SDK design](sdk.md#sampling). -Users of the API should only access the `IsRecording` property when -instrumenting code and never access `SampledFlag` unless used in context -propagators. +Users of the API should only access the `IsRecording` property when instrumenting code and never access `SampledFlag` unless used in context propagators. #### Set Attributes @@ -376,113 +308,91 @@ An `Attribute` is defined by the following properties: - (Required) The attribute key, which MUST be a non-`null` and non-empty string. - (Required) The attribute value, which is either: - A primitive type: string, boolean or numeric. - - An array of primitive type values. The array MUST be homogeneous, - i.e. it MUST NOT contain values of different types. + - An array of primitive type values. + The array MUST be homogeneous, i.e. it MUST NOT contain values of different types. The Span interface MUST provide: -- An API to set a single `Attribute` where the attribute properties are passed - as arguments. This MAY be called `SetAttribute`. To avoid extra allocations some - implementations may offer a separate API for each of the possible value types. +- An API to set a single `Attribute` where the attribute properties are passed as arguments. + This MAY be called `SetAttribute`. + To avoid extra allocations some implementations may offer a separate API for each of the possible value types. -Attributes SHOULD preserve the order in which they're set. Setting an attribute -with the same key as an existing attribute SHOULD overwrite the existing -attribute's value. +Attributes SHOULD preserve the order in which they're set. +Setting an attribute with the same key as an existing attribute SHOULD overwrite the existing attribute's value. -Attribute values expressing a numerical value of zero or an empty string are -considered meaningful and MUST be stored and passed on to span processors / exporters. -Attribute values of `null` are considered to be not set and get discarded as if -that `SetAttribute` call had never been made. -As an exception to this, if overwriting of values is supported, this results in -clearing the previous value and dropping the attribute key from the set of attributes. +Attribute values expressing a numerical value of zero or an empty string are considered meaningful and MUST be stored and passed on to span processors / exporters. +Attribute values of `null` are considered to be not set and get discarded as if that `SetAttribute` call had never been made. +As an exception to this, if overwriting of values is supported, this results in clearing the previous value and dropping the attribute key from the set of attributes. -`null` values within arrays MUST be preserved as-is (i.e., passed on to span -processors / exporters as `null`). If exporters do not support exporting `null` -values, they MAY replace those values by 0, `false`, or empty strings. -This is required for map/dictionary structures represented as two arrays with -indices that are kept in sync (e.g., two attributes `header_keys` and `header_values`, -both containing an array of strings to represent a mapping -`header_keys[i] -> header_values[i]`). +`null` values within arrays MUST be preserved as-is (i.e., passed on to span processors / exporters as `null`). +If exporters do not support exporting `null` values, they MAY replace those values by 0, `false`, or empty strings. +This is required for map/dictionary structures represented as two arrays with indices that are kept in sync (e.g., two attributes `header_keys` and `header_values`, both containing an array of strings to represent a mapping `header_keys[i] -> header_values[i]`). -Note that the OpenTelemetry project documents certain ["standard -attributes"](semantic_conventions/README.md) that have prescribed semantic meanings. +Note that the OpenTelemetry project documents certain ["standard attributes"](semantic_conventions/README.md) that have prescribed semantic meanings. #### Add Events -A `Span` MUST have the ability to add events. Events have a time associated -with the moment when they are added to the `Span`. +A `Span` MUST have the ability to add events. +Events have a time associated with the moment when they are added to the `Span`. An `Event` is defined by the following properties: - (Required) Name of the event. -- (Optional) One or more `Attribute`s with the same restrictions as defined for - [Span Attributes](#set-attributes). +- (Optional) One or more `Attribute`s with the same restrictions as defined for [Span Attributes](#set-attributes). - (Optional) Timestamp for the event. The `Event` SHOULD be an immutable type. The Span interface MUST provide: -- An API to record a single `Event` where the `Event` properties are passed as - arguments. This MAY be called `AddEvent`. -- An API to record a single `Event` whose attributes or attribute values are - lazily constructed, with the intention of avoiding unnecessary work if an event - is unused. If the language supports overloads then this SHOULD be called - `AddEvent` otherwise `AddLazyEvent` MAY be considered. In some languages, it - might be easier to defer `Event` or attribute creation entirely by providing a - wrapping class or function that returns an `Event` or formatted attributes. When - providing a wrapping class or function it SHOULD be named `EventFormatter`. +- An API to record a single `Event` where the `Event` properties are passed as arguments. + This MAY be called `AddEvent`. +- An API to record a single `Event` whose attributes or attribute values are lazily constructed, with the intention of avoiding unnecessary work if an event is unused. + If the language supports overloads then this SHOULD be called `AddEvent` otherwise `AddLazyEvent` MAY be considered. + In some languages, it might be easier to defer `Event` or attribute creation entirely by providing a wrapping class or function that returns an `Event` or formatted attributes. + When providing a wrapping class or function it SHOULD be named `EventFormatter`. -Events SHOULD preserve the order in which they're set. This will typically match -the ordering of the events' timestamps. +Events SHOULD preserve the order in which they're set. +This will typically match the ordering of the events' timestamps. -Note that the OpenTelemetry project documents certain ["standard event names and -keys"](semantic_conventions/README.md) which have prescribed semantic meanings. +Note that the OpenTelemetry project documents certain ["standard event names and keys"](semantic_conventions/README.md) which have prescribed semantic meanings. #### Set Status -Sets the [`Status`](#status) of the `Span`. If used, this will override the -default `Span` status, which is `OK`. +Sets the [`Status`](#status) of the `Span`. +If used, this will override the default `Span` status, which is `OK`. -Only the value of the last call will be recorded, and implementations are free -to ignore previous calls. +Only the value of the last call will be recorded, and implementations are free to ignore previous calls. The Span interface MUST provide: -- An API to set the `Status` where the new status is the only argument. This - SHOULD be called `SetStatus`. +- An API to set the `Status` where the new status is the only argument. + This SHOULD be called `SetStatus`. #### UpdateName -Updates the `Span` name. Upon this update, any sampling behavior based on `Span` -name will depend on the implementation. +Updates the `Span` name. +Upon this update, any sampling behavior based on `Span` name will depend on the implementation. It is highly discouraged to update the name of a `Span` after its creation. -`Span` name is often used to group, filter and identify the logical groups of -spans. And often, filtering logic will be implemented before the `Span` creation -for performance reasons. Thus the name update may interfere with this logic. +`Span` name is often used to group, filter and identify the logical groups of spans. +And often, filtering logic will be implemented before the `Span` creation for performance reasons. +Thus the name update may interfere with this logic. -The function name is called `UpdateName` to differentiate this function from the -regular property setter. It emphasizes that this operation signifies a major -change for a `Span` and may lead to re-calculation of sampling or filtering -decisions made previously depending on the implementation. +The function name is called `UpdateName` to differentiate this function from the regular property setter. +It emphasizes that this operation signifies a major change for a `Span` and may lead to re-calculation of sampling or filtering decisions made previously depending on the implementation. -Alternatives for the name update may be late `Span` creation, when Span is -started with the explicit timestamp from the past at the moment where the final -`Span` name is known, or reporting a `Span` with the desired name as a child -`Span`. +Alternatives for the name update may be late `Span` creation, when Span is started with the explicit timestamp from the past at the moment where the final `Span` name is known, or reporting a `Span` with the desired name as a child `Span`. Required parameters: -- The new **span name**, which supersedes whatever was passed in when the - `Span` was started +- The new **span name**, which supersedes whatever was passed in when the `Span` was started #### End -Finish the `Span`. This call will take the current timestamp to set as `Span`'s -end time. Implementations MUST ignore all subsequent calls to `End` (there might -be exceptions when Tracer is streaming event and has no mutable state associated -with the `Span`). +Finish the `Span`. +This call will take the current timestamp to set as `Span`'s end time. +Implementations MUST ignore all subsequent calls to `End` (there might be exceptions when Tracer is streaming event and has no mutable state associated with the `Span`). Call to `End` of a `Span` MUST not have any effects on child spans. Those may still be running and can be ended later. @@ -495,25 +405,21 @@ This API MUST be non-blocking. ### Span lifetime -Span lifetime represents the process of recording the start and the end -timestamps to the Span object: +Span lifetime represents the process of recording the start and the end timestamps to the Span object: - The start time is recorded when the Span is created. - The end time needs to be recorded when the operation is ended. -Start and end time as well as Event's timestamps MUST be recorded at a time of a -calling of corresponding API. +Start and end time as well as Event's timestamps MUST be recorded at a time of a calling of corresponding API. ## Status -`Status` interface represents the status of a finished `Span`. It's composed of -a canonical code in conjunction with an optional descriptive message. +`Status` interface represents the status of a finished `Span`. +It's composed of a canonical code in conjunction with an optional descriptive message. ### StatusCanonicalCode -`StatusCanonicalCode` represents the canonical set of status codes of a finished -`Span`, following the [Standard GRPC -codes](https://github.com/grpc/grpc/blob/master/doc/statuscodes.md): +`StatusCanonicalCode` represents the canonical set of status codes of a finished `Span`, following the [Standard GRPC codes](https://github.com/grpc/grpc/blob/master/doc/statuscodes.md): - `Ok` - The operation completed successfully. @@ -522,42 +428,37 @@ codes](https://github.com/grpc/grpc/blob/master/doc/statuscodes.md): - `Unknown` - An unknown error. - `InvalidArgument` - - Client specified an invalid argument. Note that this differs from - `FailedPrecondition`. `InvalidArgument` indicates arguments that are problematic - regardless of the state of the system. + - Client specified an invalid argument. + Note that this differs from `FailedPrecondition`. + `InvalidArgument` indicates arguments that are problematic regardless of the state of the system. - `DeadlineExceeded` - - Deadline expired before operation could complete. For operations that change the - state of the system, this error may be returned even if the operation has - completed successfully. + - Deadline expired before operation could complete. + For operations that change the state of the system, this error may be returned even if the operation has completed successfully. - `NotFound` - Some requested entity (e.g., file or directory) was not found. - `AlreadyExists` - Some entity that we attempted to create (e.g., file or directory) already exists. - `PermissionDenied` - The caller does not have permission to execute the specified operation. - `PermissionDenied` must not be used if the caller cannot be identified (use - `Unauthenticated1` instead for those errors). + `PermissionDenied` must not be used if the caller cannot be identified (use `Unauthenticated1` instead for those errors). - `ResourceExhausted` - - Some resource has been exhausted, perhaps a per-user quota, or perhaps the - entire file system is out of space. + - Some resource has been exhausted, perhaps a per-user quota, or perhaps the entire file system is out of space. - `FailedPrecondition` - - Operation was rejected because the system is not in a state required for the - operation's execution. + - Operation was rejected because the system is not in a state required for the operation's execution. - `Aborted` - - The operation was aborted, typically due to a concurrency issue like sequencer - check failures, transaction aborts, etc. + - The operation was aborted, typically due to a concurrency issue like sequencer check failures, transaction aborts, etc. - `OutOfRange` - - Operation was attempted past the valid range. E.g., seeking or reading past end - of file. Unlike `InvalidArgument`, this error indicates a problem that may be - fixed if the system state changes. + - Operation was attempted past the valid range. + E.g., seeking or reading past end of file. + Unlike `InvalidArgument`, this error indicates a problem that may be fixed if the system state changes. - `Unimplemented` - Operation is not implemented or not supported/enabled in this service. - `Internal` - - Internal errors. Means some invariants expected by underlying system has been - broken. + - Internal errors. + Means some invariants expected by underlying system has been broken. - `Unavailable` - - The service is currently unavailable. This is a most likely a transient - condition and may be corrected by retrying with a backoff. + - The service is currently unavailable. + This is a most likely a transient condition and may be corrected by retrying with a backoff. - `DataLoss` - Unrecoverable data loss or corruption. - `Unauthenticated` @@ -590,49 +491,34 @@ Returns true if the canonical code of this `Status` is `Ok`, otherwise false. ## SpanKind -`SpanKind` describes the relationship between the Span, its parents, -and its children in a Trace. `SpanKind` describes two independent -properties that benefit tracing systems during analysis. - -The first property described by `SpanKind` reflects whether the Span -is a remote child or parent. Spans with a remote parent are -interesting because they are sources of external load. Spans with a -remote child are interesting because they reflect a non-local system -dependency. - -The second property described by `SpanKind` reflects whether a child -Span represents a synchronous call. When a child span is synchronous, -the parent is expected to wait for it to complete under ordinary -circumstances. It can be useful for tracing systems to know this -property, since synchronous Spans may contribute to the overall trace -latency. Asynchronous scenarios can be remote or local. - -In order for `SpanKind` to be meaningful, callers should arrange that -a single Span does not serve more than one purpose. For example, a -server-side span should not be used directly as the parent of another -remote span. As a simple guideline, instrumentation should create a -new Span prior to extracting and serializing the span context for a -remote call. +`SpanKind` describes the relationship between the Span, its parents, and its children in a Trace. +`SpanKind` describes two independent properties that benefit tracing systems during analysis. + +The first property described by `SpanKind` reflects whether the Span is a remote child or parent. +Spans with a remote parent are interesting because they are sources of external load. +Spans with a remote child are interesting because they reflect a non-local system dependency. + +The second property described by `SpanKind` reflects whether a child Span represents a synchronous call. +When a child span is synchronous, the parent is expected to wait for it to complete under ordinary circumstances. +It can be useful for tracing systems to know this property, since synchronous Spans may contribute to the overall trace latency. +Asynchronous scenarios can be remote or local. + +In order for `SpanKind` to be meaningful, callers should arrange that a single Span does not serve more than one purpose. +For example, a server-side span should not be used directly as the parent of another remote span. +As a simple guideline, instrumentation should create a new Span prior to extracting and serializing the span context for a remote call. These are the possible SpanKinds: -* `SERVER` Indicates that the span covers server-side handling of a - synchronous RPC or other remote request. This span is the child of - a remote `CLIENT` span that was expected to wait for a response. -* `CLIENT` Indicates that the span describes a synchronous request to - some remote service. This span is the parent of a remote `SERVER` - span and waits for its response. -* `PRODUCER` Indicates that the span describes the parent of an - asynchronous request. This parent span is expected to end before - the corresponding child `CONSUMER` span, possibly even before the - child span starts. In messaging scenarios with batching, tracing - individual messages requires a new `PRODUCER` span per message to - be created. -* `CONSUMER` Indicates that the span describes the child of an - asynchronous `PRODUCER` request. -* `INTERNAL` Default value. Indicates that the span represents an - internal operation within an application, as opposed to an - operations with remote parents or children. +* `SERVER` Indicates that the span covers server-side handling of a synchronous RPC or other remote request. + This span is the child of a remote `CLIENT` span that was expected to wait for a response. +* `CLIENT` Indicates that the span describes a synchronous request to some remote service. + This span is the parent of a remote `SERVER` span and waits for its response. +* `PRODUCER` Indicates that the span describes the parent of an asynchronous request. + This parent span is expected to end before the corresponding child `CONSUMER` span, possibly even before the child span starts. + In messaging scenarios with batching, tracing individual messages requires a new `PRODUCER` span per message to be created. +* `CONSUMER` Indicates that the span describes the child of an asynchronous `PRODUCER` request. +* `INTERNAL` Default value. + Indicates that the span represents an internal operation within an application, as opposed to an operations with remote parents or children. To summarize the interpretation of these kinds: diff --git a/specification/trace/sdk.md b/specification/trace/sdk.md index 4a361033eaf..d5f5ffcbd9a 100644 --- a/specification/trace/sdk.md +++ b/specification/trace/sdk.md @@ -13,50 +13,35 @@ ## Sampling -Sampling is a mechanism to control the noise and overhead introduced by -OpenTelemetry by reducing the number of samples of traces collected and sent to -the backend. +Sampling is a mechanism to control the noise and overhead introduced by OpenTelemetry by reducing the number of samples of traces collected and sent to the backend. Sampling may be implemented on different stages of a trace collection. -OpenTelemetry API defines a `Sampler` interface that can be used at -instrumentation points by libraries to check the `SamplingResult` early and -optimize the amount of telemetry that needs to be collected. +OpenTelemetry API defines a `Sampler` interface that can be used at instrumentation points by libraries to check the `SamplingResult` early and optimize the amount of telemetry that needs to be collected. -All other sampling algorithms may be implemented on SDK layer in exporters, or -even out of process in Agent or Collector. +All other sampling algorithms may be implemented on SDK layer in exporters, or even out of process in Agent or Collector. The OpenTelemetry API has two properties responsible for the data collection: -* `IsRecording` field of a `Span`. If `true` the current `Span` records - tracing events (attributes, events, status, etc.), otherwise all tracing - events are dropped. Users can use this property to determine if expensive - trace events can be avoided. [Span Processors](#span-processor) will receive - all spans with this flag set. However, [Span Exporter](#span-exporter) will - not receive them unless the `Sampled` flag was set. -* `Sampled` flag in `TraceFlags` on `SpanContext`. This flag is propagated via - the `SpanContext` to child Spans. For more details see the [W3C Trace Context - specification][trace-flags]. This flag indicates that the `Span` has been - `sampled` and will be exported. [Span Processor](#span-processor) and [Span - Exporter](#span-exporter) will receive spans with the `Sampled` flag set for - processing. - -The flag combination `SampledFlag == false` and `IsRecording == true` -means that the current `Span` does record information, but most likely the child -`Span` will not. - -The flag combination `SampledFlag == true` and `IsRecording == false` -could cause gaps in the distributed trace, and because of this OpenTelemetry API -MUST NOT allow this combination. - -The SDK defines the two interfaces [`Sampler`](#sampler) and -[`Decision`](#decision) as well as a set of [built-in -samplers](#built-in-samplers). +* `IsRecording` field of a `Span`. + If `true` the current `Span` records tracing events (attributes, events, status, etc.), otherwise all tracing events are dropped. + Users can use this property to determine if expensive trace events can be avoided. + [Span Processors](#span-processor) will receive all spans with this flag set. + However, [Span Exporter](#span-exporter) will not receive them unless the `Sampled` flag was set. +* `Sampled` flag in `TraceFlags` on `SpanContext`. + This flag is propagated via the `SpanContext` to child Spans. + For more details see the [W3C Trace Context specification][trace-flags]. + This flag indicates that the `Span` has been `sampled` and will be exported. + [Span Processor](#span-processor) and [Span Exporter](#span-exporter) will receive spans with the `Sampled` flag set for processing. + +The flag combination `SampledFlag == false` and `IsRecording == true` means that the current `Span` does record information, but most likely the child `Span` will not. + +The flag combination `SampledFlag == true` and `IsRecording == false` could cause gaps in the distributed trace, and because of this OpenTelemetry API MUST NOT allow this combination. + +The SDK defines the two interfaces [`Sampler`](#sampler) and [`Decision`](#decision) as well as a set of [built-in samplers](#built-in-samplers). ### Sampler -`Sampler` interface allows to create custom samplers which will return a -sampling `SamplingResult` based on information that is typically available just -before the `Span` was created. +`Sampler` interface allows to create custom samplers which will return a sampling `SamplingResult` based on information that is typically available just before the `Span` was created. #### ShouldSample @@ -64,38 +49,34 @@ Returns the sampling Decision for a `Span` to be created. **Required arguments:** -* `SpanContext` of a parent `Span`. Typically extracted from the wire. Can be - `null`. -* `TraceId` of the `Span` to be created. It can be different from the `TraceId` - in the `SpanContext`. Typically in situations when the `Span` to be created - starts a new Trace. +* `SpanContext` of a parent `Span`. Typically extracted from the wire. Can be `null`. +* `TraceId` of the `Span` to be created. + It can be different from the `TraceId` in the `SpanContext`. + Typically in situations when the `Span` to be created starts a new Trace. * `SpanId` of the `Span` to be created. * Name of the `Span` to be created. * `SpanKind` * Initial set of `Attributes` for the `Span` being constructed * Collection of links that will be associated with the `Span` to be created. - Typically useful for batch operations, see - [Links Between Spans](../overview.md#links-between-spans). + Typically useful for batch operations, see [Links Between Spans](../overview.md#links-between-spans). **Return value:** It produces an output called `SamplingResult` which contains: * A sampling `Decision`. One of the following enum values: - * `NOT_RECORD` - `IsRecording() == false`, span will not be recorded and all events and attributes - will be dropped. + * `NOT_RECORD` - `IsRecording() == false`, span will not be recorded and all events and attributes will be dropped. * `RECORD` - `IsRecording() == true`, but `Sampled` flag MUST NOT be set. * `RECORD_AND_SAMPLED` - `IsRecording() == true` AND `Sampled` flag` MUST be set. * A set of span Attributes that will also be added to the `Span`. * The list of attributes returned by `SamplingResult` MUST be immutable. - Caller may call this method any number of times and can safely cache the - returned value. + Caller may call this method any number of times and can safely cache the returned value. #### GetDescription -Returns the sampler name or short description with the configuration. This may -be displayed on debug pages or in the logs. Example: -`"ProbabilitySampler{0.000100}"`. +Returns the sampler name or short description with the configuration. +This may be displayed on debug pages or in the logs. +Example: `"ProbabilitySampler{0.000100}"`. Description MUST NOT change over time and caller can cache the returned value. @@ -109,69 +90,51 @@ These are the default samplers implemented in the OpenTelemetry SDK: * ALWAYS_OFF * Description MUST be `AlwaysOffSampler`. * ALWAYS_PARENT - * `Returns RECORD_AND_SAMPLED` if `SampledFlag` is set to true on parent - SpanContext and `NOT_RECORD` otherwise. + * `Returns RECORD_AND_SAMPLED` if `SampledFlag` is set to true on parent SpanContext and `NOT_RECORD` otherwise. * Description MUST be `AlwaysParentSampler`. * Probability - * The default behavior should be to trust the parent `SampledFlag`. However - there should be configuration to change this. - * The default behavior is to apply the sampling probability only for Spans - that are root spans (no parent) and Spans with remote parent. However there - should be configuration to change this to "root spans only", or "all spans". + * The default behavior should be to trust the parent `SampledFlag`. + However there should be configuration to change this. + * The default behavior is to apply the sampling probability only for Spans that are root spans (no parent) and Spans with remote parent. + However there should be configuration to change this to "root spans only", or "all spans". * Description MUST be `ProbabilitySampler{0.000100}`. #### Probability Sampler algorithm -TODO: Add details about how the probability sampler is implemented as a function -of the `TraceID`. +TODO: Add details about how the probability sampler is implemented as a function of the `TraceID`. ## Tracer Creation -New `Tracer` instances are always created through a `TracerProvider` (see -[API](api.md#obtaining-a-tracer)). The `name` and `version` arguments -supplied to the `TracerProvider` must be used to create a -[`Resource`](../resource/sdk.md) instance which is stored on the created `Tracer`. +New `Tracer` instances are always created through a `TracerProvider` (see [API](api.md#obtaining-a-tracer)). +The `name` and `version` arguments supplied to the `TracerProvider` must be used to create a [`Resource`](../resource/sdk.md) instance which is stored on the created `Tracer`. -All configuration objects (SDK specific) and extension points (span processors, -propagators) must be provided to the `TracerProvider`. `Tracer` instances must -not duplicate this data (unless for read-only access) to avoid that different -`Tracer` instances of a `TracerProvider` have different versions of these data. +All configuration objects (SDK specific) and extension points (span processors, propagators) must be provided to the `TracerProvider`. +`Tracer` instances must not duplicate this data (unless for read-only access) to avoid that different `Tracer` instances of a `TracerProvider` have different versions of these data. -The readable representations of all `Span` instances created by a `Tracer` must -provide a `getLibraryResource` method that returns this `Resource` information -held by the `Tracer`. +The readable representations of all `Span` instances created by a `Tracer` must provide a `getLibraryResource` method that returns this `Resource` information held by the `Tracer`. ## Span processor -Span processor is an interface which allows hooks for span start and end method -invocations. The span processors are invoked only when -[`IsRecording`](api.md#isrecording) is true. +Span processor is an interface which allows hooks for span start and end method invocations. +The span processors are invoked only when [`IsRecording`](api.md#isrecording) is true. -Built-in span processors are responsible for batching and conversion of spans to -exportable representation and passing batches to exporters. +Built-in span processors are responsible for batching and conversion of spans to exportable representation and passing batches to exporters. -Span processors can be registered directly on SDK `TracerProvider` and they are -invoked in the same order as they were registered. +Span processors can be registered directly on SDK `TracerProvider` and they are invoked in the same order as they were registered. All `Tracer` instances created by a `TracerProvider` share the same span processors. Changes to this collection reflect in all `Tracer` instances. -Implementation-wise, this could mean that `Tracer` instances have a reference to -their `TracerProvider` and can access span processor objects only via this -reference. +Implementation-wise, this could mean that `Tracer` instances have a reference to their `TracerProvider` and can access span processor objects only via this reference. -Manipulation of the span processors collection must only happen on `TracerProvider` -instances. This means methods like `addSpanProcessor` must be implemented on -`TracerProvider`. +Manipulation of the span processors collection must only happen on `TracerProvider` instances. +This means methods like `addSpanProcessor` must be implemented on `TracerProvider`. -Each processor registered on `TracerProvider` is a start of pipeline that consist -of span processor and optional exporter. SDK MUST allow to end each pipeline with -individual exporter. +Each processor registered on `TracerProvider` is a start of pipeline that consist of span processor and optional exporter. +SDK MUST allow to end each pipeline with individual exporter. -SDK MUST allow users to implement and configure custom processors and decorate -built-in processors for advanced scenarios such as tagging or filtering. +SDK MUST allow users to implement and configure custom processors and decorate built-in processors for advanced scenarios such as tagging or filtering. -The following diagram shows `SpanProcessor`'s relationship to other components -in the SDK: +The following diagram shows `SpanProcessor`'s relationship to other components in the SDK: ``` +-----+--------------+ +-------------------------+ +-------------------+ @@ -192,9 +155,8 @@ in the SDK: #### OnStart(Span) -`OnStart` is called when a span is started. This method is called synchronously -on the thread that started the span, therefore it should not block or throw -exceptions. +`OnStart` is called when a span is started. +This method is called synchronously on the thread that started the span, therefore it should not block or throw exceptions. **Parameters:** @@ -204,8 +166,8 @@ exceptions. #### OnEnd(Span) -`OnEnd` is called when a span is ended. This method is called synchronously on -the execution thread, therefore it should not block or throw an exception. +`OnEnd` is called when a span is ended. +This method is called synchronously on the execution thread, therefore it should not block or throw an exception. **Parameters:** @@ -215,14 +177,15 @@ the execution thread, therefore it should not block or throw an exception. #### Shutdown() -Shuts down the processor. Called when SDK is shut down. This is an opportunity -for processor to do any cleanup required. +Shuts down the processor. +Called when SDK is shut down. +This is an opportunity for processor to do any cleanup required. -Shutdown should be called only once for each `Processor` instance. After the -call to shutdown subsequent calls to `onStart`, `onEnd`, or `forceFlush` are not allowed. +Shutdown should be called only once for each `Processor` instance. +After the call to shutdown subsequent calls to `onStart`, `onEnd`, or `forceFlush` are not allowed. -Shutdown should not block indefinitely. Language library authors can decide if -they want to make the shutdown timeout configurable. +Shutdown should not block indefinitely. +Language library authors can decide if they want to make the shutdown timeout configurable. #### ForceFlush() @@ -230,19 +193,17 @@ Export all ended spans to the configured `Exporter` that have not yet been expor `ForceFlush` should only be called in cases where it is absolutely necessary, such as when using some FaaS providers that may suspend the process after an invocation, but before the `Processor` exports the completed spans. -`ForceFlush` should not block indefinitely. Language library authors can decide if they want to make the flush timeout configurable. +`ForceFlush` should not block indefinitely. +Language library authors can decide if they want to make the flush timeout configurable. ### Built-in span processors -The standard OpenTelemetry SDK MUST implement both simple and batch processors, -as described below. Other common processing scenarios should be first considered -for implementation out-of-process in [OpenTelemetry Collector](../overview.md#collector) +The standard OpenTelemetry SDK MUST implement both simple and batch processors, as described below. +Other common processing scenarios should be first considered for implementation out-of-process in [OpenTelemetry Collector](../overview.md#collector) #### Simple processor -This is an implementation of `SpanProcessor` which passes finished spans -and passes the export-friendly span data representation to the configured -`SpanExporter`, as soon as they are finished. +This is an implementation of `SpanProcessor` which passes finished spans and passes the export-friendly span data representation to the configured `SpanExporter`, as soon as they are finished. **Configurable parameters:** @@ -250,112 +211,91 @@ and passes the export-friendly span data representation to the configured #### Batching processor -This is an implementation of the `SpanProcessor` which create batches of finished -spans and passes the export-friendly span data representations to the -configured `SpanExporter`. +This is an implementation of the `SpanProcessor` which create batches of finished spans and passes the export-friendly span data representations to the configured `SpanExporter`. **Configurable parameters:** * `exporter` - the exporter where the spans are pushed. -* `maxQueueSize` - the maximum queue size. After the size is reached spans are - dropped. The default value is `2048`. -* `scheduledDelayMillis` - the delay interval in milliseconds between two - consecutive exports. The default value is `5000`. +* `maxQueueSize` - the maximum queue size. + After the size is reached spans are dropped. + The default value is `2048`. +* `scheduledDelayMillis` - the delay interval in milliseconds between two consecutive exports. + The default value is `5000`. * `exporterTimeoutMillis` - how long the export can run before it is cancelled. The default value is `30000`. -* `maxExportBatchSize` - the maximum batch size of every export. It must be - smaller or equal to `maxQueueSize`. The default value is `512`. +* `maxExportBatchSize` - the maximum batch size of every export. + It must be smaller or equal to `maxQueueSize`. + The default value is `512`. ## Span Exporter -`Span Exporter` defines the interface that protocol-specific exporters must -implement so that they can be plugged into OpenTelemetry SDK and support sending -of telemetry data. +`Span Exporter` defines the interface that protocol-specific exporters must implement so that they can be plugged into OpenTelemetry SDK and support sending of telemetry data. -The goal of the interface is to minimize burden of implementation for -protocol-dependent telemetry exporters. The protocol exporter is expected to be -primarily a simple telemetry data encoder and transmitter. +The goal of the interface is to minimize burden of implementation for protocol-dependent telemetry exporters. +The protocol exporter is expected to be primarily a simple telemetry data encoder and transmitter. ### Interface Definition -The exporter must support two functions: **Export** and **Shutdown**. In -strongly typed languages typically there will be 2 separate `Exporter` -interfaces, one that accepts spans (SpanExporter) and one that accepts metrics -(MetricsExporter). +The exporter must support two functions: **Export** and **Shutdown**. +In strongly typed languages typically there will be 2 separate `Exporter` interfaces, one that accepts spans (SpanExporter) and one that accepts metrics (MetricsExporter). #### `Export(batch)` -Exports a batch of telemetry data. Protocol exporters that will implement this -function are typically expected to serialize and transmit the data to the -destination. +Exports a batch of telemetry data. +Protocol exporters that will implement this function are typically expected to serialize and transmit the data to the destination. Export() will never be called concurrently for the same exporter instance. Export() can be called again only after the current call returns. -Export() must not block indefinitely, there must be a reasonable upper limit -after which the call must time out with an error result (`Failure`). +Export() must not block indefinitely, there must be a reasonable upper limit after which the call must time out with an error result (`Failure`). -Any retry logic that is required by the exporter is the responsibility -of the exporter. The default SDK SHOULD NOT implement retry logic, as -the required logic is likely to depend heavily on the specific protocol -and backend the spans are being sent to. +Any retry logic that is required by the exporter is the responsibility of the exporter. +The default SDK SHOULD NOT implement retry logic, as the required logic is likely to depend heavily on the specific protocol and backend the spans are being sent to. **Parameters:** -batch - a batch of telemetry data. The exact data type of the batch is language -specific, typically it is a list of telemetry items, e.g. for spans in Java it -will be typically `Collection`. +batch - a batch of telemetry data. +The exact data type of the batch is language specific, typically it is a list of telemetry items, e.g. +for spans in Java it will be typically `Collection`. -Note that the data type for a span for illustration purposes here is written as -an imaginary type ExportableSpan (similarly for metrics it would be e.g. -ExportableMetrics). The actual data type must be specified by language library -authors, it should be able to represent the span data that can be read by the -exporter. +Note that the data type for a span for illustration purposes here is written as an imaginary type ExportableSpan (similarly for metrics it would be e.g. ExportableMetrics). +The actual data type must be specified by language library authors, it should be able to represent the span data that can be read by the exporter. **Returns:** ExportResult: ExportResult is one of: * `Success` - The batch has been successfully exported. - For protocol exporters this typically means that the data is sent over - the wire and delivered to the destination server. -* `Failure` - exporting failed. The batch must be dropped. For example, this - can happen when the batch contains bad data and cannot be serialized. + For protocol exporters this typically means that the data is sent over the wire and delivered to the destination server. +* `Failure` - exporting failed. The batch must be dropped. + For example, this can happen when the batch contains bad data and cannot be serialized. #### `Shutdown()` -Shuts down the exporter. Called when SDK is shut down. This is an opportunity -for exporter to do any cleanup required. +Shuts down the exporter. +Called when SDK is shut down. +This is an opportunity for exporter to do any cleanup required. -`Shutdown` should be called only once for each `Exporter` instance. After the -call to `Shutdown` subsequent calls to `Export` are not allowed and should -return a `Failure` result. +`Shutdown` should be called only once for each `Exporter` instance. +After the call to `Shutdown` subsequent calls to `Export` are not allowed and should return a `Failure` result. -`Shutdown` should not block indefinitely (e.g. if it attempts to flush the data -and the destination is unavailable). Language library authors can decide if they -want to make the shutdown timeout configurable. +`Shutdown` should not block indefinitely (e.g. +if it attempts to flush the data and the destination is unavailable). +Language library authors can decide if they want to make the shutdown timeout configurable. ### Further Language Specialization -Based on the generic interface definition laid out above library authors must -define the exact interface for the particular language. +Based on the generic interface definition laid out above library authors must define the exact interface for the particular language. -Authors are encouraged to use efficient data structures on the interface -boundary that are well suited for fast serialization to wire formats by protocol -exporters and minimize the pressure on memory managers. The latter typically -requires understanding of how to optimize the rapidly-generated, short-lived -telemetry data structures to make life easier for the memory manager of the -specific language. General recommendation is to minimize the number of -allocations and use allocation arenas where possible, thus avoiding explosion of -allocation/deallocation/collection operations in the presence of high rate of -telemetry data generation. +Authors are encouraged to use efficient data structures on the interface boundary that are well suited for fast serialization to wire formats by protocol exporters and minimize the pressure on memory managers. +The latter typically requires understanding of how to optimize the rapidly-generated, short-lived telemetry data structures to make life easier for the memory manager of the specific language. +General recommendation is to minimize the number of allocations and use allocation arenas where possible, thus avoiding explosion of allocation/deallocation/collection operations in the presence of high rate of telemetry data generation. #### Examples -These are examples on what the `Exporter` interface can look like in specific -languages. Examples are for illustration purposes only. Language library authors -are free to deviate from these provided that their design remain true to the -spirit of `Exporter` concept. +These are examples on what the `Exporter` interface can look like in specific languages. +Examples are for illustration purposes only. +Language library authors are free to deviate from these provided that their design remain true to the spirit of `Exporter` concept. ##### Go SpanExporter Interface diff --git a/specification/trace/sdk_exporters/zipkin.md b/specification/trace/sdk_exporters/zipkin.md index c9848bb7006..52cf03fb494 100644 --- a/specification/trace/sdk_exporters/zipkin.md +++ b/specification/trace/sdk_exporters/zipkin.md @@ -1,13 +1,11 @@ # OpenTelemetry to Zipkin Transformation This document defines the transformation between OpenTelemetry and Zipkin Spans. -Zipkin's v2 API is defined in the -[zipkin.proto](https://github.com/openzipkin/zipkin-api/blob/master/zipkin.proto) +Zipkin's v2 API is defined in the [zipkin.proto](https://github.com/openzipkin/zipkin-api/blob/master/zipkin.proto) ## Summary -The following table summarizes the major transformations between OpenTelemetry -and Zipkin. +The following table summarizes the major transformations between OpenTelemetry and Zipkin. | OpenTelemetry | Zipkin | Notes | | ------------------------ | ---------------- | ------------------------------------------------------------ | @@ -25,8 +23,7 @@ and Zipkin. | Span.Status | Add to Span.Tags | See [Status](#status) for tag names to use. | | Span.LocalChildSpanCount | TBD | TBD | -TBD : This is work in progress document and it is currently doesn't specify -mapping for these fields: +TBD : This is work in progress document and it is currently doesn't specify mapping for these fields: OpenTelemetry fields: @@ -48,13 +45,11 @@ Zipkin fields: ## Mappings -This section discusses the details of the transformations between OpenTelemetry -and Zipkin. +This section discusses the details of the transformations between OpenTelemetry and Zipkin. ### SpanKind -The following table lists all the `SpanKind` mappings between OpenTelemetry and -Zipkin. +The following table lists all the `SpanKind` mappings between OpenTelemetry and Zipkin. | OpenTelemetry | Zipkin | Note | | ------------- | ------ | ---- | @@ -68,12 +63,10 @@ Zipkin. OpenTelemetry Span `Attribute`(s) MUST be reported as `tags` to Zipkin. Primitive types MUST be converted to string using en-US culture settings. -Boolean values must use lower case strings `"true"` and `"false"`, except an -attribute named `error`. In case if value of the attribute is `false`, Zipkin -tag needs to be omitted. +Boolean values must use lower case strings `"true"` and `"false"`, except an attribute named `error`. +In case if value of the attribute is `false`, Zipkin tag needs to be omitted. -Array values MUST be serialized to string like a JSON list as mentioned in -[semantic conventions](../../overview.md#semantic-conventions). +Array values MUST be serialized to string like a JSON list as mentioned in [semantic conventions](../../overview.md#semantic-conventions). TBD: add examples @@ -88,14 +81,12 @@ The following table defines the OpenTelemetry `Status` to Zipkin `tags` mapping. |Code | `ot.status_code` | Name of the code, for example: `OK` | |Message *(optional)* | `ot.status_description` | `{message}` | -The `ot.status_code` tag value MUST follow the [Standard GRPC Code -Names](https://github.com/grpc/grpc/blob/master/doc/statuscodes.md). +The `ot.status_code` tag value MUST follow the [Standard GRPC Code Names](https://github.com/grpc/grpc/blob/master/doc/statuscodes.md). ### Events -OpenTelemetry `Event` has an optional `Attribute`(s) which is not supported by -Zipkin. Events MUST be converted to the Annotations with the names which will -include attribute values like this: +OpenTelemetry `Event` has an optional `Attribute`(s) which is not supported by Zipkin. +Events MUST be converted to the Annotations with the names which will include attribute values like this: ``` "my-event-name": { "key1" : "value1", "key2": "value2" } @@ -103,14 +94,11 @@ include attribute values like this: ### Unit of Time -Zipkin times like `timestamp`, `duration` and `annotation.timestamp` MUST be -reported in microseconds with decimal accuracy. For example, `duration` of 1234 -nanoseconds will be represented as 1.234 microseconds. +Zipkin times like `timestamp`, `duration` and `annotation.timestamp` MUST be reported in microseconds with decimal accuracy. +For example, `duration` of 1234 nanoseconds will be represented as 1.234 microseconds. ## Request Payload -For performance considerations, Zipkin fields that can be absent SHOULD be -omitted from the payload when they are empty in the OpenTelemetry `Span`. +For performance considerations, Zipkin fields that can be absent SHOULD be omitted from the payload when they are empty in the OpenTelemetry `Span`. -For example, an OpenTelemetry `Span` without any `Event` should not have an -`annotations` field in the Zipkin payload. +For example, an OpenTelemetry `Span` without any `Event` should not have an `annotations` field in the Zipkin payload. diff --git a/specification/trace/semantic_conventions/README.md b/specification/trace/semantic_conventions/README.md index 20248ddb191..4dde6e8a2ad 100644 --- a/specification/trace/semantic_conventions/README.md +++ b/specification/trace/semantic_conventions/README.md @@ -1,15 +1,11 @@ # Trace Semantic Conventions -In OpenTelemetry spans can be created freely and it’s up to the implementor to -annotate them with attributes specific to the represented operation. Spans -represent specific operations in and between systems. Some of these operations -represent calls that use well-known protocols like HTTP or database calls. -Depending on the protocol and the type of operation, additional information -is needed to represent and analyze a span correctly in monitoring systems. It is -also important to unify how this attribution is made in different languages. -This way, the operator will not need to learn specifics of a language and -telemetry collected from polyglot (multi-language) micro-service environments -can still be easily correlated and cross-analyzed. +In OpenTelemetry spans can be created freely and it’s up to the implementor to annotate them with attributes specific to the represented operation. +Spans represent specific operations in and between systems. +Some of these operations represent calls that use well-known protocols like HTTP or database calls. +Depending on the protocol and the type of operation, additional information is needed to represent and analyze a span correctly in monitoring systems. +It is also important to unify how this attribution is made in different languages. +This way, the operator will not need to learn specifics of a language and telemetry collected from polyglot (multi-language) micro-service environments can still be easily correlated and cross-analyzed. The following semantic conventions for spans are defined: diff --git a/specification/trace/semantic_conventions/database.md b/specification/trace/semantic_conventions/database.md index 8360c731d0d..2fa3af4fcd5 100644 --- a/specification/trace/semantic_conventions/database.md +++ b/specification/trace/semantic_conventions/database.md @@ -2,14 +2,11 @@ For database client call the `SpanKind` MUST be `Client`. -Span `name` should be set to low cardinality value representing the statement -executed on the database. It may be stored procedure name (without argument), sql -statement without variable arguments, etc. When it's impossible to get any -meaningful representation of the span `name`, it can be populated using the same -value as `db.instance`. +Span `name` should be set to low cardinality value representing the statement executed on the database. +It may be stored procedure name (without argument), sql statement without variable arguments, etc. +When it's impossible to get any meaningful representation of the span `name`, it can be populated using the same value as `db.instance`. -Note, Redis, Cassandra, HBase and other storage systems may reuse the same -attribute names. +Note, Redis, Cassandra, HBase and other storage systems may reuse the same attribute names. | Attribute name | Notes and examples | Required? | | :------------- | :----------------------------------------------------------- | --------- | diff --git a/specification/trace/semantic_conventions/faas.md b/specification/trace/semantic_conventions/faas.md index 6bf2e677767..689f170db54 100644 --- a/specification/trace/semantic_conventions/faas.md +++ b/specification/trace/semantic_conventions/faas.md @@ -61,7 +61,8 @@ For `faas` spans with trigger `datasource`, it is recommended to set the followi ### HTTP -The function responsibility is to provide an answer to an inbound HTTP request. The `faas` span SHOULD follow the recommendations described in the [HTTP Server semantic conventions](http.md#http-server-semantic-conventions). +The function responsibility is to provide an answer to an inbound HTTP request. +The `faas` span SHOULD follow the recommendations described in the [HTTP Server semantic conventions](http.md#http-server-semantic-conventions). ### PubSub @@ -72,7 +73,8 @@ This way, it is possible to correlate each individual message with its execution ### Timer -A function is scheduled to be executed regularly. The following additional attributes are recommended. +A function is scheduled to be executed regularly. +The following additional attributes are recommended. | Attribute name | Notes and examples | Required? | |---|---|--| diff --git a/specification/trace/semantic_conventions/http.md b/specification/trace/semantic_conventions/http.md index 4d4f0e169f1..e4e6ab4e395 100644 --- a/specification/trace/semantic_conventions/http.md +++ b/specification/trace/semantic_conventions/http.md @@ -1,8 +1,7 @@ # Semantic conventions for HTTP spans This document defines semantic conventions for HTTP client and server Spans. -They can be used for http and https schemes -and various HTTP versions like 1.1, 2 and SPDY. +They can be used for http and https schemes and various HTTP versions like 1.1, 2 and SPDY. @@ -22,26 +21,17 @@ and various HTTP versions like 1.1, 2 and SPDY. ## Name HTTP spans MUST follow the overall [guidelines for span names](../api.md#span). -Many REST APIs encode parameters into URI path, e.g. `/api/users/123` where `123` -is a user id, which creates high cardinality value space not suitable for span -names. In case of HTTP servers, these endpoints are often mapped by the server -frameworks to more concise _HTTP routes_, e.g. `/api/users/{user_id}`, which are -recommended as the low cardinality span names. However, the same approach usually -does not work for HTTP client spans, especially when instrumentation is provided -by a lower-level middleware that is not aware of the specifics of how the URIs -are formed. Therefore, HTTP client spans SHOULD be using conservative, low -cardinality names formed from the available parameters of an HTTP request, -such as `"HTTP {METHOD_NAME}"`. Instrumentation MUST NOT default to using URI -path as span name, but MAY provide hooks to allow custom logic to override the -default span name. +Many REST APIs encode parameters into URI path, e.g. `/api/users/123` where `123` is a user id, which creates high cardinality value space not suitable for span names. +In case of HTTP servers, these endpoints are often mapped by the server frameworks to more concise _HTTP routes_, e.g. `/api/users/{user_id}`, which are recommended as the low cardinality span names. +However, the same approach usually does not work for HTTP client spans, especially when instrumentation is provided by a lower-level middleware that is not aware of the specifics of how the URIs are formed. +Therefore, HTTP client spans SHOULD be using conservative, low cardinality names formed from the available parameters of an HTTP request, such as `"HTTP {METHOD_NAME}"`. +Instrumentation MUST NOT default to using URI path as span name, but MAY provide hooks to allow custom logic to override the default span name. ## Status -Implementations MUST set the [span status](../api.md#status) if the HTTP communication failed -or an HTTP error status code is returned (e.g. above 3xx). +Implementations MUST set the [span status](../api.md#status) if the HTTP communication failed or an HTTP error status code is returned (e.g. above 3xx). -In the case of an HTTP redirect, the request should normally be considered successful, -unless the client aborts following redirects due to hitting some limit (redirect loop). +In the case of an HTTP redirect, the request should normally be considered successful, unless the client aborts following redirects due to hitting some limit (redirect loop). If following a (chain of) redirect(s) successfully, the status should be set according to the result of the final HTTP request. Don't set the span status description if the reason can be inferred from `http.status_code` and `http.status_text`. @@ -93,8 +83,7 @@ This span type represents an outbound HTTP request. For an HTTP client span, `SpanKind` MUST be `Client`. -If set, `http.url` must be the originally requested URL, -before any HTTP-redirects that may happen when executing the request. +If set, `http.url` must be the originally requested URL, before any HTTP-redirects that may happen when executing the request. One of the following sets of attributes is required (in order of usual preference unless for a particular web client/framework it is known that some other set is preferable for some reason; all strings must be non-empty): @@ -103,9 +92,7 @@ One of the following sets of attributes is required (in order of usual preferenc * `http.scheme`, `net.peer.name`, `net.peer.port`, `http.target` * `http.scheme`, `net.peer.ip`, `net.peer.port`, `http.target` -Note that in some cases `http.host` might be different -from the `net.peer.name` -used to look up the `net.peer.ip` that is actually connected to. +Note that in some cases `http.host` might be different from the `net.peer.name` used to look up the `net.peer.ip` that is actually connected to. In that case it is strongly recommended to set the `net.peer.name` attribute in addition to `http.host`. For status, the following special cases have canonical error codes assigned: @@ -128,9 +115,7 @@ To understand the attributes defined in this section, it is helpful to read the ### HTTP server definitions -This section gives a short summary of some concepts -in web server configuration and web app deployment -that are relevant to tracing. +This section gives a short summary of some concepts in web server configuration and web app deployment that are relevant to tracing. Usually, on a physical host, reachable by one or multiple IP addresses, a single HTTP listener process runs. If multiple processes are running, they must listen on distinct TCP/UDP ports so that the OS can route incoming TCP/UDP packets to the right one. @@ -138,22 +123,14 @@ If multiple processes are running, they must listen on distinct TCP/UDP ports so Within a single server process, there can be multiple **virtual hosts**. The [HTTP host header][] (in combination with a port number) is normally used to determine to which of them to route incoming HTTP requests. -The host header value that matches some virtual host is called the virtual hosts's **server name**. If there are multiple aliases for the virtual host, one of them (often the first one listed in the configuration) is called the **primary server name**. See for example, the Apache [`ServerName`][ap-sn] or NGINX [`server_name`][nx-sn] directive or the CGI specification on `SERVER_NAME` ([RFC 3875][rfc-servername]). +The host header value that matches some virtual host is called the virtual hosts's **server name**. +If there are multiple aliases for the virtual host, one of them (often the first one listed in the configuration) is called the **primary server name**. +See for example, the Apache [`ServerName`][ap-sn] or NGINX [`server_name`][nx-sn] directive or the CGI specification on `SERVER_NAME` ([RFC 3875][rfc-servername]). In practice the HTTP host header is often ignored when just a single virtual host is configured for the IP. -Within a single virtual host, some servers support the concepts of an **HTTP application** -(for example in Java, the Servlet JSR defines an application as -"a collection of servlets, HTML pages, classes, and other resources that make up a complete application on a Web server" --- SRV.9 in [JSR 53][]; -in a deployment of a Python application to Apache, the application would be the [PEP 3333][] conformant callable that is configured using the -[`WSGIScriptAlias` directive][modwsgisetup] of `mod_wsgi`). +Within a single virtual host, some servers support the concepts of an **HTTP application** (for example in Java, the Servlet JSR defines an application as "a collection of servlets, HTML pages, classes, and other resources that make up a complete application on a Web server" -- SRV.9 in [JSR 53][]; in a deployment of a Python application to Apache, the application would be the [PEP 3333][] conformant callable that is configured using the [`WSGIScriptAlias` directive][modwsgisetup] of `mod_wsgi`). -An application can be "mounted" under some **application root** -(also know as *[context root][]* *[context prefix][]*, or *[document base][]*) -which is a fixed path prefix of the URL that determines to which application a request is routed -(e.g., the server could be configured to route all requests that go to an URL path starting with `/webshop/` -at a particular virtual host -to the `com.example.webshop` web application). +An application can be "mounted" under some **application root** (also know as *[context root][]* *[context prefix][]*, or *[document base][]*) which is a fixed path prefix of the URL that determines to which application a request is routed (e.g., the server could be configured to route all requests that go to an URL path starting with `/webshop/` at a particular virtual host to the `com.example.webshop` web application). Some servers allow to bind the same HTTP application to multiple `(virtual host, application root)` pairs. @@ -243,8 +220,5 @@ Span name: `/webshop/articles/:article_id`. | `http.user_agent` | `"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"` | Note that following the recommendations above, `http.url` is not set in the above example. -If set, it would be -`"https://example.com:8080/webshop/articles/4?s=1"` -but due to `http.scheme`, `http.host` and `http.target` being set, it would be redundant. -As explained above, these separate values are preferred but if for some reason the URL is available but the other values are not, -URL can replace `http.scheme`, `http.host` and `http.target`. +If set, it would be `"https://example.com:8080/webshop/articles/4?s=1"` but due to `http.scheme`, `http.host` and `http.target` being set, it would be redundant. +As explained above, these separate values are preferred but if for some reason the URL is available but the other values are not, URL can replace `http.scheme`, `http.host` and `http.target`. diff --git a/specification/trace/semantic_conventions/messaging.md b/specification/trace/semantic_conventions/messaging.md index 91d832a6536..5a4ca40c31f 100644 --- a/specification/trace/semantic_conventions/messaging.md +++ b/specification/trace/semantic_conventions/messaging.md @@ -15,9 +15,12 @@ Although messaging systems are not as standardized as, e.g., HTTP, it is assumed that the following definitions are applicable to most of them that have similar concepts at all (names borrowed mostly from JMS): -A *message* usually consists of headers (or properties, or meta information) and an optional body. It is sent by a single message *producer* to: +A *message* usually consists of headers (or properties, or meta information) and an optional body. +It is sent by a single message *producer* to: -* Physically: some message *broker* (which can be e.g., a single server, or a cluster, or a local process reached via IPC). The broker handles the actual routing, delivery, re-delivery, persistence, etc. In some messaging systems the broker may be identical or co-located with (some) message consumers. +* Physically: some message *broker* (which can be e.g., a single server, or a cluster, or a local process reached via IPC). + The broker handles the actual routing, delivery, re-delivery, persistence, etc. + In some messaging systems the broker may be identical or co-located with (some) message consumers. * Logically: some particular message *destination*. A destination is usually identified by some name unique within the messaging system instance, which might look like an URL or a simple one-word identifier. @@ -27,11 +30,14 @@ A message submitted to a queue is processed by a message *consumer* (usually exa The consumption of a message can happen in multiple steps. First, the lower-level receiving of a message at a consumer, and then the logical processing of the message. -Often, the waiting for a message is not particularly interesting and hidden away in a framework that only invokes some handler function to process a message once one is received -(in the same way that the listening on a TCP port for an incoming HTTP message is not particularly interesting). +Often, the waiting for a message is not particularly interesting and hidden away in a framework that only invokes some handler function to process a message once one is received (in the same way that the listening on a TCP port for an incoming HTTP message is not particularly interesting). However, in a synchronous conversation, the wait time for a message is important. -In some messaging systems, a message can receive a reply message that answers a particular other message that was sent earlier. All messages that are grouped together by such a reply-relationship are called a *conversation*. The grouping usually happens through some sort of "In-Reply-To:" meta information or an explicit conversation ID. Sometimes a conversation can span multiple message destinations (e.g. initiated via a topic, continued on a temporary one-to-one queue). +In some messaging systems, a message can receive a reply message that answers a particular other message that was sent earlier. +All messages that are grouped together by such a reply-relationship are called a *conversation*. +The grouping usually happens through some sort of "In-Reply-To:" meta information or an explicit conversation ID. +Sometimes a conversation can span multiple message destinations (e.g. +initiated via a topic, continued on a temporary one-to-one queue). Some messaging systems support the concept of *temporary destination* (often only temporary queues) that are established just for a particular set of communication partners (often one to one) or conversation. Often such destinations are unnamed or have an auto-generated name. @@ -89,7 +95,8 @@ Instead span kind should be set to either `CONSUMER` or `SERVER` according to th #### RabbitMQ In RabbitMQ, the destination is defined by an _exchange_ and a _routing key_. -`messaging.destination` MUST be set to the name of the exchange. This will be an empty string if the default exchange is used. +`messaging.destination` MUST be set to the name of the exchange. +This will be an empty string if the default exchange is used. The routing key MUST be provided to the attribute `messaging.rabbitmq.routing_key`, unless it is empty. ## Examples @@ -159,7 +166,9 @@ Similarly, only one value can be set as `message_id`, so C3 cannot report both ` Depending on the implementation, the producing spans might still be available in the meta data of the messages and should be added to C3 as links. The client library or application could also add the receiver span's span context to the data structure it returns for each message. In this case, C3 could also add links to the receiver spans C1 and C2. -The status of the batch processing span is selected by the application. Depending on the semantics of the operation. A span status `Ok` could, for example, be set only if all messages or if just at least one were properly processed. +The status of the batch processing span is selected by the application. +Depending on the semantics of the operation. +A span status `Ok` could, for example, be set only if all messages or if just at least one were properly processed. ``` Process P: | Span Prod1 | Span Prod2 | diff --git a/specification/trace/semantic_conventions/rpc.md b/specification/trace/semantic_conventions/rpc.md index a77966b4362..11d06851093 100644 --- a/specification/trace/semantic_conventions/rpc.md +++ b/specification/trace/semantic_conventions/rpc.md @@ -1,7 +1,6 @@ # Semantic conventions for RPC spans -This document defines how to describe remote procedure calls -(also called "remote method invocations" / "RMI") with spans. +This document defines how to describe remote procedure calls (also called "remote method invocations" / "RMI") with spans. @@ -45,15 +44,12 @@ For server-side spans `net.peer.port` is optional (it describes the port the cli ### Status -Implementations MUST set status which MUST be the same as the gRPC client/server -status. The mapping between gRPC canonical codes and OpenTelemetry status codes -is 1:1 as OpenTelemetry canonical codes is just a snapshot of grpc codes which -can be found [here](https://github.com/grpc/grpc-go/blob/master/codes/codes.go). +Implementations MUST set status which MUST be the same as the gRPC client/server status. +The mapping between gRPC canonical codes and OpenTelemetry status codes is 1:1 as OpenTelemetry canonical codes is just a snapshot of grpc codes which can be found [here](https://github.com/grpc/grpc-go/blob/master/codes/codes.go). ### Events -In the lifetime of a gRPC stream, an event for each message sent/received on -client and server spans SHOULD be created with the following attributes: +In the lifetime of a gRPC stream, an event for each message sent/received on client and server spans SHOULD be created with the following attributes: ``` -> [time], @@ -73,8 +69,6 @@ client and server spans SHOULD be created with the following attributes: "message.uncompressed_size" = ``` -The `message.id` MUST be calculated as two different counters starting from `1` -one for sent messages and one for received message. This way we guarantee that -the values will be consistent between different implementations. In case of -unary calls only one sent and one received message will be recorded for both -client and server spans. +The `message.id` MUST be calculated as two different counters starting from `1` one for sent messages and one for received message. +This way we guarantee that the values will be consistent between different implementations. +In case of unary calls only one sent and one received message will be recorded for both client and server spans. diff --git a/specification/trace/semantic_conventions/span-general.md b/specification/trace/semantic_conventions/span-general.md index 48db19a55d9..072f307fcad 100644 --- a/specification/trace/semantic_conventions/span-general.md +++ b/specification/trace/semantic_conventions/span-general.md @@ -16,11 +16,9 @@ Particular operations may refer to or require some of these attributes. ## General network connection attributes These attributes may be used for any network related operation. -The `net.peer.*` attributes describe properties of the remote end of the network connection -(usually the transport-layer peer, e.g. the node to which a TCP connection was established), -while the `net.host.*` properties describe the local end. -In an ideal situation, not accounting for proxies, multiple IP addresses or host names, -the `net.peer.*` properties of a client are equal to the `net.host.*` properties of the server and vice versa. +The `net.peer.*` attributes describe properties of the remote end of the network connection (usually the transport-layer peer, e.g. +the node to which a TCP connection was established), while the `net.host.*` properties describe the local end. +In an ideal situation, not accounting for proxies, multiple IP addresses or host names, the `net.peer.*` properties of a client are equal to the `net.host.*` properties of the server and vice versa. | Attribute name | Notes and examples | | :--------------- | :-------------------------------------------------------------------------------- | @@ -55,17 +53,15 @@ For `Unix` and `pipe`, since the connection goes over the file system instead of ### `net.*.name` attributes For IP-based communication, the name should be a DNS host name. -For `net.peer.name`, this should be the name that was used to look up the IP address that was connected to -(i.e., matching `net.peer.ip` if that one is set; e.g., `"example.com"` if connecting to an URL `https://example.com/foo`). +For `net.peer.name`, this should be the name that was used to look up the IP address that was connected to (i.e., matching `net.peer.ip` if that one is set; e.g., `"example.com"` if connecting to an URL `https://example.com/foo`). If only the IP address but no host name is available, reverse-lookup of the IP may optionally be used to obtain it. -`net.host.name` should be the host name of the local host, -preferably the one that the peer used to connect for the current operation. -If that is not known, a public hostname should be preferred over a private one. However, in that case it may be redundant with information already contained in resources and may be left out. +`net.host.name` should be the host name of the local host, preferably the one that the peer used to connect for the current operation. +If that is not known, a public hostname should be preferred over a private one. +However, in that case it may be redundant with information already contained in resources and may be left out. It will usually not make sense to use reverse-lookup to obtain `net.host.name`, as that would result in static information that is better stored as resource information. If `net.transport` is `"unix"` or `"pipe"`, the absolute path to the file representing it should be used as `net.peer.name` (`net.host.name` doesn't make sense in that context). -If there is no such file (e.g., anonymous pipe), -the name should explicitly be set to the empty string to distinguish it from the case where the name is just unknown or not covered by the instrumentation. +If there is no such file (e.g., anonymous pipe), the name should explicitly be set to the empty string to distinguish it from the case where the name is just unknown or not covered by the instrumentation. ## General identity attributes @@ -77,10 +73,9 @@ These attributes may be used for any operation with an authenticated and/or auth | `enduser.role` | Actual/assumed role the client is making the request under extracted from token or application security context. | | `enduser.scope` | Scopes or granted authorities the client currently possesses extracted from token or application security context. The value would come from the scope associated with an [OAuth 2.0 Access Token] or an attribute value in a [SAML 2.0 Assertion]. | -These attributes describe the authenticated user driving the user agent making requests to the instrumented -system. It is expected this information would be propagated unchanged from node-to-node within the system -using the Correlation Context mechanism. These attributes should not be used to record system-to-system -authentication attributes. +These attributes describe the authenticated user driving the user agent making requests to the instrumented system. +It is expected this information would be propagated unchanged from node-to-node within the system using the Correlation Context mechanism. +These attributes should not be used to record system-to-system authentication attributes. Examples of where the `enduser.id` value is extracted from: @@ -109,6 +104,4 @@ Examples of where the `enduser.id` value is extracted from: [JavaEE/JakartaEE Servlet]: https://jakarta.ee/specifications/platform/8/apidocs/javax/servlet/http/HttpServletRequest.html [Windows Communication Foundation]: https://docs.microsoft.com/en-us/dotnet/api/system.servicemodel.servicesecuritycontext?view=netframework-4.8 -Given the sensitive nature of this information, SDKs and exporters SHOULD drop these attributes by -default and then provide a configuration parameter to turn on retention for use cases where the -information is required and would not violate any policies or regulations. +Given the sensitive nature of this information, SDKs and exporters SHOULD drop these attributes by default and then provide a configuration parameter to turn on retention for use cases where the information is required and would not violate any policies or regulations.