Skip to content

APM Cypress Testing + Prometheus#2629

Merged
TackAdam merged 39 commits intoopensearch-project:mainfrom
TackAdam:apmTesting
Apr 7, 2026
Merged

APM Cypress Testing + Prometheus#2629
TackAdam merged 39 commits intoopensearch-project:mainfrom
TackAdam:apmTesting

Conversation

@TackAdam
Copy link
Copy Markdown
Collaborator

@TackAdam TackAdam commented Mar 26, 2026

Description

This PR adds comprehensive Cypress integration tests for the APM (Application Performance Monitoring) Services page, including full Prometheus metrics integration for testing RED metrics (Rate, Error, Duration) widgets.

  • Required OpenSearch Dashboard additional flags from existing framework, handled with
if [ "${{ matrix.testgroups }}" = "apm_test" ]; then
            # APM tests require workspace, query enhancements, and dataset management features
            nohup yarn start --no-base-path --no-watch \
              --home.disableExperienceModal=true \
              --uiSettings.overrides["query:enhancements:enabled"]=true \
              --data_source.enabled=true \
              --workspace.enabled=true \
              --explore.enabled=true \
              --explore.discoverTraces.enabled=true \
              --datasetManagement.enabled=true | tee dashboard.log &
          else
            # Other tests use minimal configuration
            nohup yarn start --no-base-path --no-watch --home.disableExperienceModal=true | tee dashboard.log &
          fi

OpenSearch Version Upgrade

Both CI workflows updated to use OpenSearch 3.6.0 (from 3.5.0):

  • integration-tests-workflow.yml
  • ftr-e2e-dashboards-observability-test.yml

Why: OpenSearch 3.6.0 includes a fix for the PromQL proxy that's required for the APM metrics queries to work correctly. opensearch-project/sql@054792c

Test Coverage

  • APM Services Page: Configuration modal, dataset selection, Prometheus data source selection
  • RED Metrics Widgets: Fault rate, throughput, and latency visualization using live Prometheus queries
  • Application Map: Service topology visualization
  • Time Range Handling: Dynamic time adjustment ensures tests work regardless of when they run

Infrastructure

  • Prometheus Integration: Metrics server + Prometheus scraping for realistic APM metric testing
  • Correlated Test Data: Traces, logs, service maps, and metrics that are fully correlated
  • Local Development Support: Simple script to start Prometheus locally
  • CI Integration: Automated Prometheus setup in GitHub Actions workflow

How It Works

Data Flow

┌─────────────────────────────────────────────────────────────┐
│ Test Setup (before hook)                                     │
├─────────────────────────────────────────────────────────────┤
│ 1. Get time range (now ± 1 day)                             │
│ 2. Upload OpenSearch data (traces/logs/services)            │
│    - Adjusts timestamps to current time                      │
│ 3. Wait for Prometheus metrics                              │
│    - Verifies metrics server is serving data                 │
│    - Confirms Prometheus is scraping successfully            │
│ 4. Setup APM environment (workspace/datasets/connection)     │
└─────────────────────────────────────────────────────────────┘
           │
           ▼
┌─────────────────────────────────────────────────────────────┐
│ Metrics Server (.cypress/utils/metrics_server.js)           │
├─────────────────────────────────────────────────────────────┤
│ - HTTP server on localhost:8080                             │
│ - Serves test metrics in Prometheus text format             │
│ - Reads JSON metric files from apm_data/metrics/            │
│ - Serves without timestamps (Prometheus assigns scrape time)│
└─────────────────────────────────────────────────────────────┘
           │
           ▼ (scrapes every 5s)
┌─────────────────────────────────────────────────────────────┐
│ Prometheus Server                                            │
├─────────────────────────────────────────────────────────────┤
│ - Scrapes metrics server every 5 seconds                     │
│ - Assigns current timestamp to each metric                   │
│ - Stores in TSDB for querying                               │
│ - OpenSearch Dashboards queries via PromQL API              │
└─────────────────────────────────────────────────────────────┘
           │
           ▼
┌─────────────────────────────────────────────────────────────┐
│ OpenSearch Dashboards APM UI                                │
├─────────────────────────────────────────────────────────────┤
│ - Queries Prometheus for RED metrics                        │
│ - Queries OpenSearch for traces/logs/service map            │
│ - All data has current timestamps                           │
│ - Time range (now ± 1 day) captures all data               │
└─────────────────────────────────────────────────────────────┘

Timestamp Alignment

All data sources use the same reference point (current system time):

  1. OpenSearch Data (traces/logs/services):

    • Original timestamps: ~Feb 5, 2026
    • Adjusted during upload: originalTime + (now - baseTime)
    • Result: All data timestamped at "now"
  2. Prometheus Metrics:

    • Metrics served without timestamps
    • Prometheus assigns scrape time (current time)
    • Result: All metrics timestamped at "now"
  3. Query Window:

    • Test uses: now - 1 day to now + 1 day
    • Captures all freshly-timestamped data

No coordination files or shared timestamps needed - everything naturally aligns to the system's current time.

Key Files

.cypress/
├── fixtures/
│   └── prometheus/
│       └── prometheus.yml                    # Prometheus scrape config (5s interval)
├── integration/
│   └── apm_test/
│       └── apm_services.spec.js              # Main APM test suite
├── utils/
│   ├── apm_data_helpers.js                   # Upload & timestamp adjustment logic
│   ├── metrics_server.js                     # HTTP server for Prometheus scraping
│   ├── start_local_prometheus.sh             # Local dev script (NEW)
│   ├── commands.osd.js                       # Enhanced workspace/dataset commands
│   ├── helpers.js                            # APM environment setup helpers
│   └── apm_data/                            # Test data (67K+ lines)
│       ├── metrics/                          # Prometheus metrics (JSON)
│       ├── traces/                           # OpenTelemetry spans (NDJSON)
│       ├── logs/                             # OpenTelemetry logs (NDJSON)
│       └── services/                         # Service map data (NDJSON)
└── support/
    └── commands.js                           # Cypress custom commands

.github/workflows/
└── integration-tests-workflow.yml            # CI: Auto-starts Prometheus for apm_test
└── ftr-e2e-dashboards-observability-test.yml # CI: OpenSearch 3.6.0 upgrade

Local Testing

Prerequisites

  1. Node.js - Already installed for OpenSearch Dashboards development
  2. Prometheus - Required for metrics collection

Running Tests Locally

Option 1: Automated Script (Recommended)

# Navigate to plugin directory
cd OpenSearch-Dashboards/plugins/dashboards-observability

# Start metrics server + Prometheus (runs in foreground)
./.cypress/utils/start_local_prometheus.sh

What this does:

  1. Starts metrics server on http://localhost:8080
  2. Starts Prometheus on http://localhost:9090 (scrapes metrics server every 5s)
  3. Waits for both services to be ready
  4. Verifies metrics are available
  5. Displays PIDs for cleanup
  6. Keeps running until you stop it (Ctrl+C)

In a separate terminal, run the tests:

cd OpenSearch-Dashboards/plugins/dashboards-observability

# Set Prometheus URL for tests
export PROMETHEUS_CONNECTION_URL=http://localhost:9090

# Run APM tests
yarn cypress:run --spec '.cypress/integration/apm_test/*.spec.js'

# Or open Cypress UI
yarn cypress:open

Cleanup:

# Kill both processes
kill $(cat /tmp/apm-test-pids.txt)

# Or just Ctrl+C in the terminal running the script

Option 2: Manual Setup

Terminal 1 - Metrics Server:

cd OpenSearch-Dashboards/plugins/dashboards-observability
node .cypress/utils/metrics_server.js

Terminal 2 - Prometheus:

cd OpenSearch-Dashboards/plugins/dashboards-observability
prometheus \
  --config.file=./.cypress/fixtures/prometheus/prometheus.yml \
  --storage.tsdb.path=/tmp/prometheus-apm-local \
  --web.listen-address=:9090

Terminal 3 - Tests:

cd OpenSearch-Dashboards/plugins/dashboards-observability
export PROMETHEUS_CONNECTION_URL=http://localhost:9090
yarn cypress:run --spec '.cypress/integration/apm_test/*.spec.js'

Verifying Setup

Check metrics server is serving data:

curl http://localhost:8080/metrics | head -20

Expected output:

# TYPE request counter
request{service="cart",namespace="span_derived",remoteService=""} 5234
request{service="frontend",namespace="span_derived",remoteService=""} 8912
# TYPE fault counter
fault{service="checkout",namespace="span_derived",remoteService=""} 125
...

Check Prometheus is scraping:

# Check for fault metrics
curl 'http://localhost:9090/api/v1/query?query=fault' | jq '.data.result | length'

# Should return a number > 0 (e.g., 15)

Check Prometheus UI:
Open http://localhost:9090 in your browser and run queries like:

  • fault{remoteService=""}
  • request{service="cart"}
  • sum by (service) (fault) / sum by (service) (request) * 100

CI Behavior

GitHub Actions Workflow

The integration-tests-workflow.yml has been enhanced to automatically set up Prometheus when running apm_test group:

Setup Steps (when matrix.testgroups == 'apm_test'):

  1. Downloads Prometheus 3.8.1
  2. Starts metrics server in background (port 8080)
  3. Starts Prometheus in background (port 9090)
  4. Waits 10 seconds for initial scrapes
  5. Verifies metrics are available
  6. Sets PROMETHEUS_CONNECTION_URL environment variable
  7. Tests run with full Prometheus integration

Time Investment:

  • Setup time: ~15 seconds
  • Saved time: No need to wait for metrics to accumulate (tests handle it)
  • Tests are fully isolated and repeatable

Test Data

Services in Sample Data

The test data includes a realistic microservices application with:

  • frontend-proxy (C++ ingress)
  • frontend (Node.js)
  • cart (.NET)
  • checkout (Go)
  • product-reviews (Python)
  • And more...

All services have fully correlated:

  • ✅ RED metrics in Prometheus (rate, error, duration)
  • ✅ Distributed traces in OpenSearch
  • ✅ Correlated logs in OpenSearch
  • ✅ Service map topology in OpenSearch

Data Size

  • Total test data: ~67,000 lines
  • Metrics: 41,000+ samples across 6 metric types
  • Traces: 50 correlated spans
  • Logs: 72 log entries
  • Service Map: 53 service connections

Testing Checklist

  • Local testing with script: .cypress/utils/start_local_prometheus.sh
  • Verified metrics server serves correct format
  • Verified Prometheus scrapes and stores metrics
  • Verified OpenSearch data upload with adjusted timestamps
  • Verified APM UI displays metrics correctly
  • Verified time range handling works across different execution times
  • CI workflow tested with OpenSearch 3.6.0
  • All existing tests still pass

Additional Context

This testing infrastructure follows the standard Prometheus scraping pattern (similar to the parent OpenSearch Dashboards repository's approach) rather than using complex Remote Write or backfill mechanisms. This makes it:

  • Easy to debug (just curl http://localhost:8080/metrics)
  • Standard and familiar to anyone who's used Prometheus
  • CI-friendly (no special dependencies beyond Prometheus binary)
  • Fast (no backfill generation or TSDB block creation)

Issues Resolved

[List any issues this PR will resolve]

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Adam Tackett added 4 commits March 26, 2026 12:00
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Adam Tackett and others added 23 commits March 26, 2026 14:09
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Adam Tackett and others added 10 commits April 4, 2026 17:02
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
Signed-off-by: Adam Tackett <tackadam@amazon.com>
@TackAdam TackAdam marked this pull request as ready for review April 6, 2026 22:20
Comment on lines +8 to +9
OPENSEARCH_VERSION: '3.6.0'
OPENSEARCH_PLUGIN_VERSION: '3.6.0.0-SNAPSHOT'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pull the env version from package.json

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion

      - name: Checkout OpenSearch Dashboards
        uses: actions/checkout@v4
        with:
          repository: opensearch-project/Opensearch-Dashboards
          ref: ${{ env.OPENSEARCH_DASHBOARDS_VERSION }}
          path: OpenSearch-Dashboards

      - name: Get OpenSearch and plugin versions from package.json
        id: versions
        run: |
          VERSION=$(node -p "require('./OpenSearch-Dashboards/package.json').version")
          echo "OPENSEARCH_VERSION=$VERSION" >> $GITHUB_ENV
          echo "OPENSEARCH_PLUGIN_VERSION=${VERSION}.0-SNAPSHOT" >> $GITHUB_ENV
          echo "OpenSearch version: $VERSION"
          echo "Plugin version: ${VERSION}.0-SNAPSHOT"

      - name: Checkout dashboards observability
        uses: actions/checkout@v4
        with:
          path: OpenSearch-Dashboards/plugins/dashboards-observability

Signed-off-by: Adam Tackett <tackadam@amazon.com>
@TackAdam TackAdam merged commit f5cbda1 into opensearch-project:main Apr 7, 2026
24 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants