Skip to content

groundcover-com/dashboard-examples

Repository files navigation

groundcover Example Dashboards

A curated collection of ready-to-use groundcover dashboards that give you instant observability across services, infrastructure, databases, networking, logs, traces, and storage. Clone this repo and import them into your groundcover account to hit the ground running — or use them as inspiration for your own custom dashboards.

Getting Started

There are two ways to bring these dashboards into your groundcover environment: importing via the UI or managing them as code with Terraform.

Option 1 — Import via the groundcover UI

  1. Open your groundcover account and navigate to Dashboards.
  2. Click Create New Dashboard.
  3. Give the dashboard a name and, optionally, a description.
  4. Once inside the dashboard editor, click Actions → Import and paste (or upload) the contents of one of the JSON files from this repository.
  5. The widgets, layout, queries, and variables defined in the JSON will be loaded into the dashboard.

Tip: After import you can freely rearrange widgets, tweak queries, or add variables to match your environment.

Option 2 — Terraform (Infrastructure as Code)

The groundcover Terraform provider lets you create, update, and delete dashboards as code. This is the recommended approach for teams that version-control their observability configuration, promote dashboards across environments, or want drift detection.

Prerequisites

Requirement Details
Terraform >= 1.0
groundcover provider >= 1.1.1 (registry.terraform.io/groundcover-com/groundcover)
API key Generated in the groundcover UI under Settings → Access → API Keys

1. Configure the provider

terraform {
  required_providers {
    groundcover = {
      source  = "registry.terraform.io/groundcover-com/groundcover"
      version = ">= 1.1.1"
    }
  }
}

provider "groundcover" {
  api_key    = var.groundcover_api_key
  backend_id = var.groundcover_backend_id
}

2. Define a dashboard resource

Each JSON file in this repo maps to the preset field of a groundcover_dashboard resource. For example, to deploy the Service / Application Overview dashboard:

resource "groundcover_dashboard" "service_overview" {
  name        = "Service / Application Overview (Golden Signals)"
  description = "Golden signals — request rate, error rate, latency, and throughput for all workloads."
  preset      = file("${path.module}/service.json")
}

Note: The preset argument accepts the dashboard JSON as a string. Using file() keeps your Terraform code clean, but you can also inline the JSON or use jsonencode() if you want to template values.

3. Apply

terraform init
terraform plan
terraform apply

Dashboards deployed via Terraform are marked Provisioned in the groundcover UI. They are read-only by default to protect your source of truth; you can unlock them for quick edits, but the next terraform apply will reconcile state.

4. Import an existing dashboard into Terraform state

If you already created one of these dashboards manually, bring it under Terraform management without recreating it:

terraform import groundcover_dashboard.service_overview <dashboard_id>

For more details, see the Managing Dashboards with Terraform documentation and the provider GitHub repo.


Dashboard Catalog

Service / Application Overview (Golden Signals)

File: service.json

The starting point for any investigation. This dashboard surfaces the four golden signals — request rate, error rate, latency, and throughput — using groundcover's workload-level and HTTP resource metrics. It opens with stat panels for instant RPS, error percentage, and p99 latency at a glance, followed by time-series breakdowns of each signal over time. A bar chart of the top HTTP endpoints by request rate rounds out the view.

Key widgets: RPS stat, error rate stat, latency p99 stat, request rate over time, error rate over time, latency percentiles (p50/p95/p99), network throughput (RX+TX), top HTTP endpoints by RPS.

Variables: Cluster, Workload.

Use case: SRE on-call triage, service health checks, SLO monitoring, release validation.


Infrastructure / Node Health

File: infrastructure.json

Answers the question: is this a system problem or an application problem? This dashboard focuses entirely on host-level metrics — CPU usage, memory consumption, disk I/O throughput, filesystem utilization, network bytes in/out, and system load averages (1/5/15 min). Every panel is scoped to individual nodes so you can quickly isolate a noisy neighbor or a failing host.

Key widgets: CPU usage % by host, memory usage %, disk I/O read/write KB/s, disk space used %, network I/O bytes/sec by interface, load average (1/5/15).

Variables: Cluster, Host.

Use case: Infrastructure capacity reviews, node-level performance troubleshooting, correlating application issues with underlying host health.


Capacity / Saturation

File: capacity.json

Focused on resource saturation and limits. This dashboard tells you where you're running out of headroom before it becomes an outage: CPU throttling at the container level, memory pressure (both host and cgroup), filesystem and PVC utilization, disk I/O queue depth, connection stress (open vs. failed connection rates), and node allocatable CPU remaining.

Key widgets: Host CPU saturation %, container CPU throttling %, memory pressure (host + cgroup avg60), disk utilization %, PVC utilization %, disk queue depth / I/O utilization, connection stress (opened vs. failed rates), node allocatable CPU headroom (mCPU).

Variables: Cluster, Namespace.

Use case: Capacity planning, proactive scaling decisions, identifying throttled containers, preventing disk-full and OOM incidents.


Database / Data Store

File: database.json

Answers: is the data layer the bottleneck? This dashboard monitors SQL workload golden signals for PostgreSQL and MySQL — query/operation rate, latency percentiles (p50/p95/p99), error and rejected operation rates — alongside the container resource footprint of common data-store workloads (Postgres, MySQL, MariaDB, MongoDB, Redis, Cassandra, ClickHouse, Elasticsearch). It also tracks active connection rates, PVC storage headroom, and an efficiency proxy comparing successful vs. total SQL operations.

Key widgets: Query rate (PostgreSQL + MySQL), query latency p50/p95/p99, database container CPU (mCPU), database container memory (working set bytes), active connection rates, failed/rejected operations, success vs. total SQL ops (stacked area), PVC usage %.

Variables: Cluster, Namespace.

Use case: Database performance monitoring, slow-query investigation, connection pool health, storage capacity planning for stateful workloads.


Network / Service Map

File: network.json

Visualizes how services talk to each other and where communication is failing. This dashboard shows service-to-service RX and TX throughput (top edges), HTTP latency between client/server pairs (p95), HTTP error rates between services, cross-availability-zone traffic for cost awareness, and failed/refused connections as a symptom of dependency outages.

Key widgets: Service-to-service RX throughput (stacked area), service-to-service TX throughput, HTTP p95 latency by client/server pair, HTTP error rate between services, cross-AZ traffic (flagged edges), failed/refused connections.

Variables: Cluster, Workload, HTTP Client, HTTP Server.

Use case: Dependency mapping, microservice communication debugging, cross-AZ cost optimization, detecting downstream outages.


Logs Overview / Log Volume

File: logs.json

Answers: what changed when things broke? This dashboard provides an aggregate view of log volume across your environment — total log throughput over time, error and warning counts, info-level volume, a bar chart of the noisiest workloads, a pie chart breaking down severity levels, and a table of error log counts by workload for targeted investigation.

Key widgets: Log volume over time, error log count, warning log volume, info log volume, top workloads by log volume (bar), severity breakdown (pie), error log counts by workload (table).

Variables: Cluster, Workload.

Use case: Incident correlation, identifying noisy/error-prone services, log cost optimization, severity trend analysis.


Tracing / Distributed Requests

File: tracing.json

Answers: where exactly is the latency coming from? This dashboard combines live trace tables with metrics-based panels. A recent traces table lets you click into any trace ID for a full waterfall view. A bar chart highlights the slowest endpoints by p95 duration. Metrics panels cross-check HTTP resource latency (p95) and HTTP error rates by workload, while a service-dependency panel (network RX by partner workload) shows how traffic flows. An error-traces table filters to status:error for fast root-cause analysis.

Key widgets: Recent traces (table with trace_id links), slow endpoints by p95 duration (bar), service dependencies (RX bytes by partner), HTTP resource latency p95 (time-series), HTTP error rate by workload (time-series), error traces (filtered table).

Variables: None (unscoped by default — all clusters and workloads).

Use case: Latency root-cause analysis, distributed tracing exploration, error trace investigation, cross-service dependency visualization.


Claude Code / Tokenomics

File: claude_code.json

This dashboard is for Claude Code usage economics and behavior: spend, token mix, cache efficiency, model usage, API latency, prompts, tool calls, and per-session cost. It exists so engineering orgs can see who is spending what, whether prompt caching is paying off, and how models and tools are used in practice — alongside the rest of groundcover’s observability story for AI-assisted work.

Integration (groundcover + Claude Code): groundcover treats Claude Code as a first-class AI Tools data source. In the app, open Data Sources, pick the Claude Code wizard, and follow the steps: create or reuse an ingestion key, use the auto-filled backend address, and wire your environment. On the Claude Code side, telemetry is exported via OpenTelemetry: enable it with CLAUDE_CODE_ENABLE_TELEMETRY=1 and send metrics and logs (structured events) over OTLP — e.g. OTEL_METRICS_EXPORTER=otlp and OTEL_LOGS_EXPORTER=otlp with your collector or endpoint as documented in Claude Code monitoring. Metrics power counters such as cost and token usage with labels like user_email; log events (queried with workload:"claude-code") carry API requests, prompts, and tool decisions. Traces are optional (separate beta flags in Claude Code) and are not required for this preset.

OTTL pipeline (recommended for log panels): Many Claude Code log lines arrive with structured fields inside the body string. Without an ingestion-time OTTL pipeline, those fields may not be available as first-class log attributes, which limits filtering, content: matching, and stats in this dashboard. Configure a rule in groundcover (for example on your collector or data-source ingestion settings, depending on your deployment) so relevant key/value material from the body is promoted into attributes before storage. The example below groks the body, keeps snake_case keys (matching typical Claude Code event fields), merges them into attributes, and only applies when workload == "claude-code" and container_name is empty (adjust conditions to match how your environment labels these logs):

ottlRules:
  - ruleName: Claude Code API Request
    statements:
      - set(cache, ExtractGrokPatterns(body, "^%{GREEDYDATA:message}"))
      - keep_matching_keys(cache, "^[a-z_]+$" )
      - merge_maps(attributes, cache, "insert")
      - set(format, "grok")
    conditions:
      - container_name == ""
      - workload == "claude-code"
    statementsErrorMode: propagate
    conditionLogicOperator: and

Key widgets: Seven-day stats — total USD cost, session count, cache hit rate, active time, average API latency, API call count; cost over time and cost share by model; token volume by type and cache read vs. input; API duration and requests by model; user prompts over time, API compute time, tool calls by name, top sessions by cost; recent API request events (model, tokens, cost, duration, session).

Variables: User Email (regex; default .* for org-wide).

Use case: AI spend governance, cache and model-mix optimization, spotting high-latency or expensive sessions, correlating tool usage with cost.


PVC Usage

File: pvc.json

A focused dashboard for Persistent Volume Claim storage monitoring. It opens with an overview section showing overall PVC usage as a stat panel alongside a bar chart of the top 10 PVCs by usage percentage. A trends section tracks PVC usage over time by name and namespace. A capacity alerts section surfaces critically full PVCs (above 80%) in a table with cluster, namespace, and storage class context so you can act before volumes fill up.

Key widgets: Overall PVC usage % (stat), top PVCs by usage % (bar), PVC usage over time (time-series), critically full PVCs >80% (table).

Variables: Namespace, Cluster.

Use case: Storage capacity planning, preventing volume-full outages, identifying PVCs approaching saturation across environments.


Dashboard JSON Structure

Each dashboard JSON file follows groundcover's native schema (currently schemaVersion: 7) and contains:

Field Description
name Dashboard display name
description Short summary shown in the dashboard list
preset.widgets[] Array of widget definitions — each has an id, type (widget, text, or section), name, queries[], and visualizationConfig
preset.layout[] Grid positions (x, y, w, h) for each widget on a 24-column grid
preset.variables[] Dashboard-level filter variables (cluster, namespace, workload, etc.) that scope queries dynamically
preset.duration Default time range (e.g., "Last 1 hour")

Queries use either PromQL-style expressions (dataType: "metrics") for metrics data or gcQL pipeline expressions (dataType: "logs" / "traces") for logs and traces.

Customization

These dashboards are designed as starting points. Common modifications include:

  • Adjusting variables to match your cluster naming conventions or add environment-level filters.
  • Filtering workloads by adding regex patterns to query label selectors (e.g., narrowing the database dashboard to only your production Postgres instances).
  • Changing time ranges from the default 1-hour window to match your monitoring cadence.
  • Adding widgets for custom metrics your applications expose through groundcover.

Resources

About

Repo of example dashboards for groundcover

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors