Skip to content

feat: add distributed tracing for webhook handling and PipelineRun timing#2605

Merged
zakisk merged 1 commit intotektoncd:mainfrom
ci-operator:distributed-tracing
Apr 17, 2026
Merged

feat: add distributed tracing for webhook handling and PipelineRun timing#2605
zakisk merged 1 commit intotektoncd:mainfrom
ci-operator:distributed-tracing

Conversation

@ci-operator
Copy link
Copy Markdown
Contributor

@ci-operator ci-operator commented Mar 25, 2026

Description

Add OpenTelemetry distributed tracing to Pipelines-as-Code. When tracing is
enabled via the pipelines-as-code-config-observability ConfigMap, PaC emits
trace spans for webhook event processing and PipelineRun lifecycle timing.

Controller: Emits a PipelinesAsCode:ProcessEvent span for each webhook
event. Extracts W3C trace context from inbound webhook headers when present,
otherwise creates a new root span. Propagates the span context onto created
PipelineRuns via the tekton.dev/pipelinerunSpanContext annotation, enabling
end-to-end traces when downstream controllers also have tracing enabled.

Watcher: Emits waitDuration (creation → start) and executeDuration
(start → completion) timing spans for completed PipelineRuns, using resource
timestamps for accurate wall-clock timing.

Configuration

Tracing uses two ConfigMaps (both already exist in the PaC deployment manifests):

  1. pipelines-as-code-config-observability — tracing protocol, endpoint, and
    sampling rate. The controller and watcher locate this ConfigMap via the
    CONFIG_OBSERVABILITY_NAME environment variable already set in their deployment
    YAMLs. Changes require a pod restart — the trace exporter is created once at
    startup.

  2. pipelines-as-code — three new keys (tracing-label-action,
    tracing-label-application, tracing-label-component) configure which
    PipelineRun labels are read for span attributes. Changes are picked up
    automatically without restart.

See docs/content/docs/operations/tracing.md for the full schema, attribute
tables, and deployment guidance.

Testing tracing on a cluster

The default config/305-config-observability.yaml ships with tracing disabled.
To enable it, add tracing-protocol, tracing-endpoint, and
tracing-sampling-rate to the observability ConfigMap and restart the controller
and watcher pods. Note that ko apply -f config/ will overwrite the
observability ConfigMap with defaults — re-apply tracing fields afterward.

Linked Issue

https://redhat.atlassian.net/browse/PVO11Y-5067

Testing Strategy

  • Unit tests
  • Integration tests
  • End-to-end tests
  • Manual testing

Unit tests cover span attribute emission, trace context injection/extraction,
W3C parentage (incoming context honored vs. new root created), malformed
annotation handling, result enum mapping, and message truncation. Manually
verified end-to-end on a local cluster with Jaeger.

AI Assistance

  • I have used AI assistance for this PR.

Co-authored-by trailers are on each commit.

Submitter Checklist

  • Commit messages follow the project guide
  • make test and make lint pass locally
  • Documentation updated for user-facing changes
  • Unit tests added for code changes

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the observability of Pipelines-as-Code by integrating OpenTelemetry distributed tracing. This allows operators and developers to gain deeper insights into the performance and flow of webhook event processing and the various stages of PipelineRun execution. By propagating trace context, it facilitates a unified view of operations across PaC and Tekton Pipelines, streamlining debugging and performance analysis.

Highlights

  • Distributed Tracing Implementation: Introduced OpenTelemetry distributed tracing to Pipelines-as-Code, enabling visibility into webhook event processing and PipelineRun lifecycle timing.
  • Webhook Event Tracing: The controller now emits a 'PipelinesAsCode:ProcessEvent' span, covering the full webhook event lifecycle from receipt to PipelineRun creation, with relevant VCS attributes.
  • PipelineRun Timing Spans: The watcher emits 'waitDuration' (creation to start) and 'executeDuration' (start to completion) spans for completed PipelineRuns, using accurate resource timestamps.
  • Trace Context Propagation: Trace context is propagated to created PipelineRuns via the 'tekton.dev/pipelinerunSpanContext' annotation, allowing for end-to-end traces when Tekton Pipelines also has tracing enabled.
  • Configurable Observability: Tracing is configured through the existing 'pipelines-as-code-config-observability' ConfigMap using new keys: 'tracing-protocol', 'tracing-endpoint', and 'tracing-sampling-rate'.
  • New Documentation: Added comprehensive documentation on how to enable and configure distributed tracing for Pipelines-as-Code.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces OpenTelemetry distributed tracing to Pipelines-as-Code. Key changes include integrating tracing into the event handling flow, propagating trace context to PipelineRuns via annotations, and emitting timing spans for PipelineRun lifecycle events. The observability configuration has been updated to include new tracing options, but the removal of existing metrics-protocol and metrics-endpoint configurations requires clarification and documentation. Additionally, an improvement opportunity was identified to ensure consistent tracing data by always setting VCS repository and revision attributes, even when empty.

I am having trouble creating individual review comments. Click here to see my feedback.

config/305-config-observability.yaml (24-25)

high

The metrics-protocol and metrics-endpoint configurations are being removed from the data section. This change was not explicitly mentioned in the pull request description, which focuses on adding tracing. If these metrics were actively used, their removal could be a breaking change or an unintended side effect. Please clarify if this removal is intentional and, if so, document it in the PR description or release notes.

pkg/adapter/adapter.go (212-217)

medium

For better consistency in tracing data, consider always setting the VCSRepositoryKey and VCSRevisionKey attributes, even if l.event.URL or l.event.SHA are empty. This ensures the attribute key is always present in the span, which can simplify querying and analysis in tracing backends. You could set them to an empty string or a placeholder like "unknown" if the values are not available, instead of omitting the attribute entirely.

if l.event.URL != "" {
			span.SetAttributes(tracing.VCSRepositoryKey.String(l.event.URL))
		} else {
			span.SetAttributes(tracing.VCSRepositoryKey.String(""))
		}
		if l.event.SHA != "" {
			span.SetAttributes(tracing.VCSRevisionKey.String(l.event.SHA))
		} else {
			span.SetAttributes(tracing.VCSRevisionKey.String(""))
		}

@ci-operator ci-operator force-pushed the distributed-tracing branch from 393b9d3 to 3870a7f Compare March 25, 2026 15:04
@chmouel
Copy link
Copy Markdown
Member

chmouel commented Mar 25, 2026

/ok-to-test

@chmouel
Copy link
Copy Markdown
Member

chmouel commented Mar 27, 2026

@zakisk can you have a look pls

@zakisk
Copy link
Copy Markdown
Member

zakisk commented Mar 30, 2026

@ci-operator for E2E run we're working on permission workaround in this PR #2611

@chmouel
Copy link
Copy Markdown
Member

chmouel commented Mar 30, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements distributed tracing for Pipelines-as-Code using OpenTelemetry. It adds logic to extract trace context from incoming webhook headers, propagate it to PipelineRuns via a new annotation, and emit timing spans for event processing and PipelineRun execution. The PR also includes configuration updates and new documentation. Feedback points out that an error during JSON marshalling of the span context is currently ignored and should be logged to assist with debugging.

Comment thread pkg/kubeinteraction/labels.go Outdated
@chmouel
Copy link
Copy Markdown
Member

chmouel commented Mar 30, 2026

this conflitcs with recently merged 0faad24

@chmouel
Copy link
Copy Markdown
Member

chmouel commented Mar 30, 2026

/ok-to-test

@zakisk
Copy link
Copy Markdown
Member

zakisk commented Mar 30, 2026

@ci-operator there are still some conflicts there I think, can you please fix them

@zakisk zakisk removed the ok-to-test label Mar 30, 2026
@zakisk
Copy link
Copy Markdown
Member

zakisk commented Mar 30, 2026

due to recent changes in #2622, label is not deleted automatically because we've changed the way it should have been deleted.

@ci-operator ci-operator force-pushed the distributed-tracing branch from d01d86f to f190d03 Compare March 30, 2026 16:45
@zakisk
Copy link
Copy Markdown
Member

zakisk commented Mar 30, 2026

/ok-to-test

Comment thread docs/content/docs/operations/tracing.md
Comment thread pkg/adapter/adapter.go Outdated
Comment thread pkg/adapter/adapter.go Outdated
Comment thread pkg/tracing/tracing.go Outdated
Comment thread pkg/tracing/tracing.go Outdated
@chmouel
Copy link
Copy Markdown
Member

chmouel commented Mar 31, 2026

/ok-to-test

Comment thread pkg/adapter/adapter.go
Comment on lines -239 to -243
// payload validation
var event map[string]any
if err := json.Unmarshal([]byte(reqBody), &event); err != nil {
return nil, &log, fmt.Errorf("invalid event body format: %w", err)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this validation deleted?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unintentional - restored.

Copy link
Copy Markdown
Member

@zakisk zakisk Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it may have been a tab in cursor and it delete next few lines 😄 no worries

Comment thread pkg/adapter/adapter.go Outdated
)

span.SetAttributes(
semconv.VCSProviderNameKey.String(gitProvider.GetConfig().Name),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am getting empty provider name

Image

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The provider name isn't resolved until SetClient runs, which happens after handleEvent. Moved the attribute to SetupAuthenticatedClient where it's actually available.

@zakisk
Copy link
Copy Markdown
Member

zakisk commented Apr 15, 2026

for linters and go-testing, can you please setup pre-commit?

@chmouel
Copy link
Copy Markdown
Member

chmouel commented Apr 15, 2026

/ok-to-test

@ci-operator ci-operator force-pushed the distributed-tracing branch from ed26d26 to 286cdd7 Compare April 15, 2026 17:44
@chmouel
Copy link
Copy Markdown
Member

chmouel commented Apr 15, 2026

/label ok-to-test

@chmouel
Copy link
Copy Markdown
Member

chmouel commented Apr 15, 2026

/ok-to-test

@pipelines-as-code
Copy link
Copy Markdown

✅ Added labels: ok-to-test.

@chmouel
Copy link
Copy Markdown
Member

chmouel commented Apr 15, 2026

/ok-to-test

@chmouel
Copy link
Copy Markdown
Member

chmouel commented Apr 15, 2026

/retest

| --- | --- |
| `vcs.provider.name` | Git provider name |
| `vcs.repository.url.full` | Repository URL |
| `vcs.ref.head.revision` | Head commit SHA |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://opentelemetry.io/docs/specs/semconv/registry/attributes/vcs/#vcs-change-id
@ci-operator what about also having vcs.change.id which is I guess pull request number? I know you've added revision but it would be helpful for tracer to have pull request number.

@zakisk
Copy link
Copy Markdown
Member

zakisk commented Apr 16, 2026

@ci-operator it looks good! can you please fix linters..

Instrument PaC with OpenTelemetry tracing. The controller emits a
ProcessEvent span for each webhook, honoring incoming W3C trace context
when present. The watcher emits waitDuration and executeDuration timing
spans for completed PipelineRuns. Trace context is propagated onto
created PipelineRuns via annotation for end-to-end delivery traces.

Tracing configuration uses the existing pipelines-as-code-config-observability
ConfigMap (located via CONFIG_OBSERVABILITY_NAME env var). Label-sourced
span attributes are configurable via the main pipelines-as-code ConfigMap.

See docs/content/docs/operations/tracing.md for the schema.
@zakisk
Copy link
Copy Markdown
Member

zakisk commented Apr 17, 2026

ok-to-test

@zakisk
Copy link
Copy Markdown
Member

zakisk commented Apr 17, 2026

/ok-to-test

@zakisk
Copy link
Copy Markdown
Member

zakisk commented Apr 17, 2026

/lgtm

@pipelines-as-code
Copy link
Copy Markdown

LGTM Vote Breakdown

  • Current valid votes: 0/1
  • Voting required for approval: 1

Votes Summary:

Reviewer Permission Valid Vote

Automated by the PAC Boussole 🧭

@zakisk zakisk merged commit bd9f468 into tektoncd:main Apr 17, 2026
17 of 26 checks passed
@zakisk
Copy link
Copy Markdown
Member

zakisk commented Apr 17, 2026

@ci-operator thanks for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants