Skip to content

Commit 66af450

Browse files
GitHub Actionsjmagak
authored andcommitted
Configuring log aggregation and observability for sonataflow
1 parent cff4db8 commit 66af450

File tree

5 files changed

+92
-121
lines changed

5 files changed

+92
-121
lines changed

assemblies/extend_orchestrator-in-rhdh/assembly-configure-opentelemetry-for-sonataflow-workflows.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ include::../modules/extend_orchestrator-in-rhdh/proc-enable-opentelemetry-for-so
1717
include::../modules/extend_orchestrator-in-rhdh/proc-configure-telemetry-exporters.adoc[leveloffset=+1]
1818

1919
// SonataFlow OpenTelemetry attributes and events
20-
include::../modules/extend_orchestrator-in-rhdh/proc-configure-observability-tools.adoc[leveloffset=+1]
20+
include::../modules/extend_orchestrator-in-rhdh/ref-observability-configuration-examples.adoc[leveloffset=+1]
2121

2222
// Troubleshooting OpenTelemetry integration
2323
include::../modules/extend_orchestrator-in-rhdh/ref-troubleshoot-opentelemetry-connectivity.adoc[leveloffset=+1]

modules/extend_orchestrator-in-rhdh/proc-configure-telemetry-exporters.adoc

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,25 @@
11
:_mod-docs-content-type: PROCEDURE
22

33
[id="configure-telemetry-exporters_{context}"]
4-
= Export telemetry data to an observability platform
4+
= Configure telemetry data exporters for observability platforms
55

66
[role="_abstract"]
7-
When monitoring serverless workflows in a distributed environment, export trace data efficiently to an external observability platform (such as Jaeger or an OpenTelemetry Collector). Configuring your export strategy and externalizing environment variables makes sure that your telemetry data is delivered reliably without hardcoding sensitive configurations in your production builds.
7+
To monitor serverless workflows in a distributed environment, export trace data to an external observability platform, such as Jaeger or an OpenTelemetry Collector. By configuring an export strategy and externalizing environment variables, you ensure reliable telemetry delivery and avoid hardcoding configurations in production builds.
88

99
.Prerequisites
1010

1111
* You have enabled OpenTelemetry in your workflow.
12+
* An observability platform (Jaeger or OpenTelemetry Collector) is available in your cluster.
1213

1314
.Procedure
1415

15-
. Configure your export strategy
16+
. Define your export strategy.
1617
+
17-
Choose one of the following export strategies based on the requirements of your observability platform:
18+
Choose an export strategy that matches your observability platform requirements:
1819

19-
.. OTLP exporter with batch processing (Recommended)
20+
** OTLP exporter with batch processing (Recommended)
2021
+
21-
For production environments, using an OTLP exporter with batch processing reduces network overhead and improves performance, as shown:
22+
For production environments, use an OTLP exporter with batch processing to reduce network overhead and improve performance.
2223
+
2324
[source,bash]
2425
----
@@ -34,9 +35,9 @@ quarkus.otel.bsp.export.timeout=2s
3435
quarkus.otel.bsp.max.queue.size=2048
3536
----
3637

37-
.. Direct Export to an external platform
38+
** Direct export to an external platform
3839
+
39-
For simpler setups or direct integrations, you can configure a direct export as shown:
40+
For development or simple integrations, use a direct export configuration.
4041
+
4142
[source,bash]
4243
----
@@ -46,9 +47,9 @@ quarkus.otel.exporter.otlp.protocol=grpc
4647
quarkus.otel.traces.exporter=cdi
4748
----
4849

49-
. Externalize configuration for production deployments
50+
. Externalize the configuration for production deployments.
5051
+
51-
To maintain secure and flexible production deployments, use environment variables to externalize your OpenTelemetry configuration:
52+
Use environment variables to externalize your OpenTelemetry configuration. This ensures that your deployment remains secure and flexible across environments.
5253
+
5354
[source,bash]
5455
----

modules/extend_orchestrator-in-rhdh/proc-enable-opentelemetry-for-sonataflow-workflows.adoc

Lines changed: 18 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -6,28 +6,27 @@
66
[role="_abstract"]
77
Enable OpenTelemetry in your SonataFlow project to begin collecting distributed traces, metrics, and logs.
88

9-
The OpenTelemetry integration for SonataFlow provides the following:
9+
To enable observability features such as tracing and metrics in the SonataFlow runtime, you must add the OpenTelemetry addon and configure the workflow properties. The `sonataflow-addons-quarkus-opentelemetry` addon provides a standard configuration with minimal setup required.
1010

11-
Distributed tracing:: Tracks workflow execution across multiple services and steps.
12-
Metrics collection:: Monitors performance, duration, and success rates.
13-
Log aggregation:: Centralized logging with trace correlation.
14-
Context propagation:: Maintains trace context across workflow boundaries and async operations.
11+
The OpenTelemetry integration for SonataFlow includes the following capabilities:
1512

16-
[NOTE]
17-
====
18-
To enable observability features like tracing and metrics in the SonataFlow runtime, use the `sonataflow-addons-quarkus-opentelemetry` addon. This addon provides a standard configuration that requires little additional setup.
19-
====
13+
* Distributed tracing: Track workflow execution across services and steps.
14+
15+
* Metrics collection: Monitor performance, duration, and success rates.
16+
17+
* Log aggregation: Centralize logs with trace correlation.
18+
19+
* Context propagation: Maintain trace context across workflow boundaries and asynchronous operations.
2020

2121
.Prerequisites
2222

23-
* You have installed and configured a SonataFlow Operator.
24-
* You have access to deploy observability infrastructure.
25-
* You have Kubernetes or OpenShift cluster with appropriate resources.
26-
* You understand OpenTelemetry concepts.
23+
* You have installed and configured the SonataFlow Operator.
24+
* You have `cluster-admin` or equivalent permissions to deploy observability infrastructure and modify ConfigMaps.
25+
* A Kubernetes or OpenShift cluster is available.
2726

2827
.Procedure
2928

30-
. Add the SonataFlow OpenTelemetry addon to your `QUARKUS_EXTENSIONS` environment variable during the image build process:
29+
. Add the OpenTelemetry addon to the `QUARKUS_EXTENSIONS` environment variable during the image build process:
3130
+
3231
[source,bash]
3332
----
@@ -36,7 +35,7 @@ export QUARKUS_EXTENSIONS="${QUARKUS_EXTENSIONS},org.apache.kie.sonataflow:sonat
3635

3736
. Open the `{workflow-name}-props` ConfigMap for your workflow.
3837

39-
. In the `application.properties` section, enable the OpenTelemetry integration:
38+
. In the `application.properties` section, enable the OpenTelemetry integration and configure the service attributes:
4039
+
4140
[source,bash]
4241
----
@@ -69,37 +68,27 @@ sonataflow.otel.spans.enabled=true
6968
sonataflow.otel.events.enabled=true
7069
----
7170

72-
. Save the ConfigMap and restart the workflow pod.
71+
. Save the ConfigMap and restart the workflow pod to apply the changes.
7372

7473
.Verification
7574

76-
* Check OpenTelemetry addon is loaded:
75+
* Verify that the OpenTelemetry addon is loaded by checking the pod logs:
7776
+
7877
[source,bash]
7978
----
8079
kubectl logs -n workflows deployment/onboarding-workflow | grep "sonataflow-addons-quarkus-opentelemetry"
8180
----
8281

83-
* Verify trace export:
82+
* Verify successful trace export:
8483
+
8584
[source,bash]
8685
----
87-
# Look for successful trace exports
8886
kubectl logs -n workflows deployment/greeting | grep -i "export\|batch"
8987
----
9088

91-
* Check Jaeger health:
89+
* Confirm that the observability backend (for example, Jaeger) is receiving data:
9290
+
9391
[source,bash]
9492
----
95-
# Verify Jaeger is receiving data
9693
kubectl logs -n observability deployment/jaeger | grep -i "span\|trace"
97-
----
98-
99-
* Test with debug exporter:
100-
+
101-
[source,bash]
102-
----
103-
# Temporarily enable console logging
104-
%dev.quarkus.otel.traces.exporter=logging
10594
----

modules/extend_orchestrator-in-rhdh/proc-configure-observability-tools.adoc renamed to modules/extend_orchestrator-in-rhdh/ref-observability-configuration-examples.adoc

Lines changed: 24 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,16 @@
11
:_mod-docs-content-type: CONCEPT
22

3-
[id="configure-observability-tools_{context}"]
4-
= Tools for configuring observability
3+
[id="observability-configuration-examples_{context}"]
4+
= Observability tool configuration examples
55

66
[role="_abstract"]
7-
The following are working examples for two essential observability tools that integrate seamlessly with SonataFlow OpenTelemetry implementation.
7+
Use the following examples to deploy and integrate Jaeger and Loki with the SonataFlow OpenTelemetry implementation. These examples include configurations for development and production environments.
88

9-
. Jaeger distributed tracing tool.
9+
. Jaeger distributed tracing
1010
+
1111
Jaeger provides distributed tracing visualization for SonataFlow workflows.
1212
+
13-
Deploy Jaeger on OpenShift or Kubernetes as shown in the following examples:
14-
+
15-
All-in-one deployment (development or testing):::
13+
.Jaeger all-in-one deployment (development and testing)
1614
+
1715
[source,yaml]
1816
----
@@ -104,8 +102,8 @@ spec:
104102
targetPort: 16686
105103
type: ClusterIP
106104
----
107-
108-
OpenShift route for UI access:::
105+
+
106+
.OpenShift route for UI access
109107
+
110108
[source,yaml]
111109
----
@@ -124,8 +122,10 @@ spec:
124122
termination: edge
125123
insecureEdgeTerminationPolicy: Redirect
126124
----
127-
128-
Workflow Configuration for Jaeger:::
125+
+
126+
.Workflow configuration for Jaeger
127+
+
128+
Add these properties to the `application.properties` file of your workflow:
129129
+
130130
[source,bash]
131131
----
@@ -137,8 +137,8 @@ quarkus.otel.traces.exporter=cdi
137137
# Additional Jaeger-specific propagation
138138
quarkus.otel.propagators=tracecontext,baggage,jaeger
139139
----
140-
141-
Production Deployment with Elasticsearch:::
140+
+
141+
.Production deployment with Elasticsearch
142142
+
143143
For production environments, use the Jaeger Operator with Elasticsearch storage:
144144
+
@@ -176,13 +176,11 @@ spec:
176176
memory: 512Mi
177177
----
178178

179-
. Loki log aggregation tool.
180-
+
181-
Grafana Loki provides scalable log aggregation with native OpenTelemetry support, enabling comprehensive log analysis and correlation with traces. Loki natively supports OpenTelemetry Protocol (OTLP) for direct log ingestion from SonataFlow workflows.
179+
. Loki log aggregation
182180
+
183-
Deploy Loki on OpenShift or Kubernetes as shown in the following examples:
181+
Loki supports OpenTelemetry Protocol (OTLP) for direct log ingestion from SonataFlow workflows.
184182
+
185-
Loki configuration for OpenTelemetry:::
183+
.Loki configuration for OpenTelemetry
186184
+
187185
[source,yaml]
188186
----
@@ -237,8 +235,8 @@ data:
237235
prefix: index_
238236
period: 24h
239237
----
240-
241-
Loki deployment:::
238+
+
239+
.Loki deployment
242240
+
243241
[source,yaml]
244242
----
@@ -321,10 +319,10 @@ spec:
321319
targetPort: 9096
322320
type: ClusterIP
323321
----
324-
325-
. To configure workflow for Loki:
326322
+
327-
* Direct Connection to Loki (Recommended for simplicity):
323+
.Workflow configuration for Loki and Jaeger
324+
+
325+
To route logs to Loki and traces to Jaeger, use the following configuration:
328326
+
329327
[source,bash]
330328
----
@@ -353,27 +351,11 @@ quarkus.otel.resource.attributes=\
353351
deployment.environment=production
354352
----
355353

356-
. Optional: OpenTelemetry collector for advanced processing:
357-
+
358-
For production environments requiring log processing, enrichment, or multi-backend export, you can optionally deploy an OpenTelemetry collector between your workflows and the observability backends.
359-
+
360-
The following are the benefits of using a collector:
361-
362-
* Log enrichment with Kubernetes metadata
363-
* Filtering and sampling
364-
* Multi-destination export (for example, logs to both Loki and external SIEM)
365-
* Centralized processing and transformation
366-
367-
. To quickly setup the collector:
354+
. Optional: OpenTelemetry collector for advanced processing
368355
+
369-
[source,bash]
370-
----
371-
# Change workflow configuration to send to collector instead
372-
quarkus.otel.exporter.otlp.endpoint=http://otel-collector.observability.svc.cluster.local:4317
373-
----
374-
375-
. To configure collector:
356+
Deploy an OpenTelemetry collector between workflows and backends for advanced log processing, filtering, and multi-destination export.
376357
+
358+
.Collector pipeline configuration
377359
[source,yaml,subs="+attributes,+quotes"]
378360
----
379361
# Collector routes to both Jaeger and Loki

modules/extend_orchestrator-in-rhdh/ref-troubleshoot-opentelemetry-connectivity.adoc

Lines changed: 38 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -4,70 +4,69 @@
44
= Troubleshoot OpenTelemetry connectivity
55

66
[role="_abstract"]
7-
Diagnose and resolve missing observability data and context propagation.
7+
Diagnose and resolve issues with missing observability data or failed context propagation in SonataFlow.
88

9-
If traces or logs are missing, you lose the visibility required to debug application issues or trace execution paths across services. You must quickly diagnose and resolve misconfigurations in your OpenTelemetry deployment so that monitoring is restored.
9+
.OpenTelemetry troubleshooting guide
10+
[cols="25%,35%,40%", options="header"]
11+
|===
12+
| Symptom
13+
| Potential cause
14+
| Resolution
1015

11-
Traces do not appear::
16+
| Traces do not appear in the dashboard.
17+
| OpenTelemetry is disabled or the endpoint is unreachable.
18+
| Verify the `quarkus.otel.enabled` property and test endpoint connectivity.
1219

13-
.. Check workflow configuration:
14-
+
15-
[source,bash]
16-
----
17-
# Verify OpenTelemetry is enabled
18-
kubectl get cm onboarding-workflow-props -n workflows -o yaml
20+
| Authentication errors (`401`/`403`).
21+
| Missing or invalid authorization headers.
22+
| Configure the `quarkus.otel.exporter.otlp.headers` property with a valid token.
1923

20-
# Check pod logs for OpenTelemetry initialization
21-
kubectl logs -n workflows deployment/onboarding-workflow | grep -i "otel\|trace"
22-
----
24+
| High memory usage in the collector.
25+
| Large telemetry batches or high traffic volume.
26+
| Implement a `memory_limiter` processor in the collector configuration.
2327

24-
.. Verify connectivity:
28+
| Context is lost between workflow steps.
29+
| Incorrect propagator configuration.
30+
| Ensure `quarkus.otel.propagators` includes all required formats (for example, `tracecontext` and `baggage`).
31+
|===
32+
33+
.Diagnosing missing traces
34+
35+
.. Verify that OpenTelemetry is enabled in the workflow ConfigMap:
2536
+
2637
[source,bash]
2738
----
28-
# Test Jaeger endpoint from workflow pod
29-
kubectl exec -n workflows deployment/greeting -- curl -v http://jaeger-collector.observability.svc.cluster.local:4317
39+
kubectl get cm {workflow-name}-props -n workflows -o yaml
3040
----
3141

32-
Authentication issues::
33-
+
34-
.. For platforms requiring authentication, configure headers:
42+
.. Check the pod logs for initialization errors:
3543
+
3644
[source,bash]
3745
----
38-
quarkus.otel.exporter.otlp.headers=authorization=Bearer ${API_TOKEN}
46+
kubectl logs deployment/{deployment-name} -n workflows | grep -i "otel"
3947
----
4048

41-
.. To authenticate the test:
49+
.. Test the connection to the Jaeger collector from within the workflow pod:
4250
+
4351
[source,bash]
4452
----
45-
curl -v -X POST \
46-
-H "Authorization: Bearer YOUR_TOKEN" \
47-
-H "Content-Type: application/x-protobuf" \
48-
https://your-endpoint.com/v1/traces
53+
kubectl exec deployment/{deployment-name} -n workflows -- curl -v http://jaeger-collector.observability.svc.cluster.local:4317
4954
----
5055

51-
High memory usage::
52-
+
53-
.. Configure memory limits in collector:
54-
+
55-
[source,yaml]
56+
.Configuring authentication headers
57+
If your observability platform requires authentication, add the following property to your `application.properties` file:
58+
59+
[source,bash]
5660
----
57-
processors:
58-
memory_limiter:
59-
check_interval: 1s
60-
limit_mib: 1000
61-
spike_limit_mib: 200
61+
quarkus.otel.exporter.otlp.headers=authorization=Bearer ${API_TOKEN}
6262
----
6363

64-
Context lost between steps::
65-
+
66-
Make sure there is a proper propagation:
67-
+
64+
.Resolving context propagation issues
65+
To ensure trace IDs are maintained across service boundaries, configure the following propagators and enable JSON logging to verify the IDs in the output:
66+
6867
[source,bash]
6968
----
70-
# Include all required propagators
69+
# Include required propagators
7170
quarkus.otel.propagators=tracecontext,baggage,jaeger
7271
7372
# Enable JSON logging to verify trace IDs

0 commit comments

Comments
 (0)