Skip to content

fix(observability): fix top performers data loss, dead guards, display bugs, and response time unit conversion#3794

Merged
crivetimihai merged 4 commits intomainfrom
fix/3580-top-performers-response-time-unit
Apr 6, 2026
Merged

fix(observability): fix top performers data loss, dead guards, display bugs, and response time unit conversion#3794
crivetimihai merged 4 commits intomainfrom
fix/3580-top-performers-response-time-unit

Conversation

@shoummu1
Copy link
Copy Markdown
Collaborator

Bug-fix PR

📌 Summary

Fixes five silent display/logic bugs in the Top Performers metrics panel (#3793) and resolves the response time unit inconsistency (seconds displayed as milliseconds) across all three monitoring panels in the Admin UI (#3580): Top Performers, KPI cards, and individual entity metrics cards.

🔗 Related Issues

Closes: #3793
Closes: #3580

🔁 Reproduction Steps

See #3793 for full steps. Minimal:

  1. Navigate to Admin UI → Metrics → Top Performers
  2. Observe "Last Used" column — raw ISO string (2026-03-13T06:00:00.000000) displayed
  3. Observe "Success Rate" for a tool with no executions — shows 0.0% with red badge instead of N/A
  4. Observe "Avg Response Time" in Top Performers and entity metrics cards — value shown in seconds (e.g. 1.194) instead of milliseconds (1194ms)
  5. Register a tool with 0% success rate — the panel treats it as if it has no metrics at all

🐞 Root Cause

Six bugs across three files:

# File Bug
1 metrics_common.py build_top_performers() used falsy checks (if result.success_rate, if result.avg_response_time) which coerced genuine 0.0 to None, losing the distinction between "no data" and actual zero
2 metrics_top_performers_partial.html Dead != 'N/A' guard — avgResponseTime is Optional[float], never the string 'N/A'; branch was unreachable
3 metrics_top_performers_partial.html Dead != 'Never' guard — lastExecution is None or an ISO string, never the string 'Never'; branch was unreachable
4 metrics_top_performers_partial.html {{ last_exec }} rendered the raw ISO-8601 datetime verbatim instead of a human-readable format
5 metrics_top_performers_partial.html item.get('successRate', 0) or 0 collapsed None (no executions) into 0, making un-executed tools display 0.0% with a red error badge
6 metrics_top_performers_partial.html, admin.js avg_response_time (stored in seconds) was rendered without ×1000 conversion across Top Performers template, KPI cards, entity metrics cards, paginated table rows, and CSV export

💡 Fix Description

mcpgateway/utils/metrics_common.py

  • Changed falsy checks to explicit is not None guards so 0.0 values are preserved:
    # Before
    avg_response_time=float(result.avg_response_time) if result.avg_response_time else None,
    success_rate=float(result.success_rate) if result.success_rate else None,
    # After
    avg_response_time=float(result.avg_response_time) if result.avg_response_time is not None else None,
    success_rate=float(result.success_rate) if result.success_rate is not None else None,

mcpgateway/templates/metrics_top_performers_partial.html

  • Removed dead != 'N/A' guard; None check is sufficient
  • Added ×1000 conversion for avgResponseTime (seconds → ms)
  • Replaced item.get('successRate', 0) or 0 with a proper None-aware check; renders N/A (no badge) for un-executed tools
  • Removed dead != 'Never' guard
  • Replaced verbatim {{ last_exec }} with {{ last_exec[:16] | replace('T', ' ') }} → e.g. 2026-03-13 06:00

mcpgateway/static/admin.js

  • extractKPIData: multiplied weighted average avgResponseTime by ×1000 before returning to KPI display
  • createTopPerformersTable / updateTableRows: ||?? (preserves 0.0) + ×1000 conversion applied consistently to both initial render and paginated updates
  • exportMetricsToCSV: same ?? + ×1000 fix so downloaded CSV values match the UI
  • createMetricsCard: added explicit avgResponseTime branch (×1000, .toFixed(1) ms) and lastExecutionTime branch (ISO slice → YYYY-MM-DD HH:mm)

🧪 Verification

Check Command Status
Lint suite make lint
Unit tests make test
Coverage ≥ 90 % make coverage
Manual regression no longer fails Tested on UI

Manual checks performed:

  • Top Performers "Avg Response Time": 1194ms
  • Top Performers "Last Used": 2026-03-13 06:00
  • Top Performers "Success Rate": 100.0% badge for executed tool ✅
  • Resources/Prompts/Servers tabs: N/A (no badge) for un-executed entities ✅
  • Tools Metrics card "Average Response Time": 1194.1 ms
  • Tools Metrics card "Last Execution Time": 2026-03-13 06:00

📐 MCP Compliance (if relevant)

  • Matches current MCP spec
  • No breaking change to MCP clients — display-only fixes, no API/schema changes

✅ Checklist

  • Code formatted (make black isort pre-commit)
  • No secrets/credentials committed

@shoummu1 shoummu1 requested a review from crivetimihai as a code owner March 23, 2026 08:43
@shoummu1 shoummu1 force-pushed the fix/3580-top-performers-response-time-unit branch from 86c42bd to ddf6b1a Compare March 24, 2026 05:41
@shoummu1 shoummu1 force-pushed the fix/3580-top-performers-response-time-unit branch from 07be570 to 342b76c Compare March 24, 2026 10:04
@shoummu1 shoummu1 added the bug Something isn't working label Mar 24, 2026
@shoummu1 shoummu1 force-pushed the fix/3580-top-performers-response-time-unit branch from 342b76c to 45faa1d Compare March 25, 2026 05:00
@shoummu1 shoummu1 changed the title fix(metrics): fix top performers data loss, dead guards, display bugs, and response time unit conversion fix(observability): fix top performers data loss, dead guards, display bugs, and response time unit conversion Mar 25, 2026
@shoummu1 shoummu1 force-pushed the fix/3580-top-performers-response-time-unit branch from 45faa1d to 8d2339b Compare March 27, 2026 05:15
@shoummu1
Copy link
Copy Markdown
Collaborator Author

shoummu1 commented Mar 27, 2026

Hi @mxwlantian, Can you please help me review this PR? It includes changes for #3580

@crivetimihai crivetimihai added MUST P1: Non-negotiable, critical requirements without which the product is non-functional or unsafe ui User Interface observability Observability, logging, monitoring labels Mar 29, 2026
@crivetimihai crivetimihai added this to the Release 1.0.0 milestone Mar 29, 2026
@crivetimihai
Copy link
Copy Markdown
Member

Thanks @shoummu1 — good catch on the data loss and unit conversion issues. Small, targeted fix. Please add the DCO Signed-off-by line.

@shoummu1 shoummu1 force-pushed the fix/3580-top-performers-response-time-unit branch 4 times, most recently from 439dd19 to 548164b Compare April 1, 2026 07:50
@crivetimihai crivetimihai force-pushed the fix/3580-top-performers-response-time-unit branch from 548164b to b879f83 Compare April 5, 2026 16:25
@crivetimihai
Copy link
Copy Markdown
Member

Maintainer review notes

Rebased onto current main and reviewed the full changeset. The core fixes are correct — confirmed avg_response_time is stored in seconds throughout the stack (DB columns, rollup service, schemas all document "seconds"), so the ×1000 display conversion is right.

Rebase conflict resolution

mcpgateway/static/admin.js was split into modules under mcpgateway/admin_ui/ by #3137. All admin.js changes were transplanted to mcpgateway/admin_ui/metrics.js. Test conflicts in admin-parsing.test.js resolved by keeping main's structure with updated assertions.

Changes made during review

Bug fix — CSV export zero-value handling:
The exportMetricsToCSV function used a truthiness check on the ?? result, which would treat avg_response_time=0 as N/A. Changed to != null for consistency with the table rendering code.

Dead code removal — createPerformanceCard:
The PR added avgResponseTime/lastExecutionTime display branches to createPerformanceCard, but that function's metrics list only contains memoryUsage, cpuUsage, diskIo, networkThroughput, cacheHitRate, and activeThreads — those branches were unreachable. Reverted to the original simple display logic.

Trailing whitespace fix in updateTableRows.

Test fixes:

  • Updated tests/unit/js/metrics.test.js assertion (150 → 150000) broken by the ×1000 conversion.
  • Added 4 JS tests for createMetricsCard: seconds→ms conversion, ISO datetime formatting, null handling, zero-value preservation.
  • Added Python test for build_top_performers verifying that 0.0 values for avg_response_time and success_rate are preserved (not coerced to None).

Verification

  • All 89 metrics JS tests pass
  • All 3 Python test_metrics_common tests pass
  • Full JS suite: 2005 passed, 3 skipped

shoummu1 and others added 4 commits April 6, 2026 18:08
…bugs

Signed-off-by: Shoumi <shoumimukherjee@gmail.com>
…ards

Signed-off-by: Shoumi <shoumimukherjee@gmail.com>
…e ms conversion

Signed-off-by: Shoumi <shoumimukherjee@gmail.com>
…bug, add differential tests

- Remove unreachable avgResponseTime/lastExecutionTime branches from
  createPerformanceCard (its metrics list only has memory/CPU/disk/network keys)
- Fix CSV export treating avg_response_time=0 as N/A by using != null
  instead of truthiness check, consistent with table rendering
- Fix trailing whitespace in updateTableRows
- Update modularized test assertion for ×1000 conversion (150 → 150000)
- Add JS tests for createMetricsCard: seconds→ms conversion, ISO datetime
  formatting, null handling, zero-value preservation
- Add Python test verifying build_top_performers preserves 0.0 values

Closes #3793, Closes #3580

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
@crivetimihai crivetimihai force-pushed the fix/3580-top-performers-response-time-unit branch from b879f83 to 4b137ec Compare April 6, 2026 17:39
@crivetimihai crivetimihai merged commit a318c07 into main Apr 6, 2026
34 of 35 checks passed
@crivetimihai crivetimihai deleted the fix/3580-top-performers-response-time-unit branch April 6, 2026 18:01
jonpspri pushed a commit that referenced this pull request Apr 10, 2026
…y bugs, and response time unit conversion (#3794)

* fix metrics top performers panel data loss, dead guards, and display bugs

Signed-off-by: Shoumi <shoumimukherjee@gmail.com>

* convert avg_response_time seconds to ms in admin.js KPI and metrics cards

Signed-off-by: Shoumi <shoumimukherjee@gmail.com>

* update vitest assertions and prettier formatting for avg_response_time ms conversion

Signed-off-by: Shoumi <shoumimukherjee@gmail.com>

* fix: remove dead code from createPerformanceCard, fix CSV zero-value bug, add differential tests

- Remove unreachable avgResponseTime/lastExecutionTime branches from
  createPerformanceCard (its metrics list only has memory/CPU/disk/network keys)
- Fix CSV export treating avg_response_time=0 as N/A by using != null
  instead of truthiness check, consistent with table rendering
- Fix trailing whitespace in updateTableRows
- Update modularized test assertion for ×1000 conversion (150 → 150000)
- Add JS tests for createMetricsCard: seconds→ms conversion, ISO datetime
  formatting, null handling, zero-value preservation
- Add Python test verifying build_top_performers preserves 0.0 values

Closes #3793, Closes #3580

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

---------

Signed-off-by: Shoumi <shoumimukherjee@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>
claudia-gray pushed a commit that referenced this pull request Apr 13, 2026
…y bugs, and response time unit conversion (#3794)

* fix metrics top performers panel data loss, dead guards, and display bugs

Signed-off-by: Shoumi <shoumimukherjee@gmail.com>

* convert avg_response_time seconds to ms in admin.js KPI and metrics cards

Signed-off-by: Shoumi <shoumimukherjee@gmail.com>

* update vitest assertions and prettier formatting for avg_response_time ms conversion

Signed-off-by: Shoumi <shoumimukherjee@gmail.com>

* fix: remove dead code from createPerformanceCard, fix CSV zero-value bug, add differential tests

- Remove unreachable avgResponseTime/lastExecutionTime branches from
  createPerformanceCard (its metrics list only has memory/CPU/disk/network keys)
- Fix CSV export treating avg_response_time=0 as N/A by using != null
  instead of truthiness check, consistent with table rendering
- Fix trailing whitespace in updateTableRows
- Update modularized test assertion for ×1000 conversion (150 → 150000)
- Add JS tests for createMetricsCard: seconds→ms conversion, ISO datetime
  formatting, null handling, zero-value preservation
- Add Python test verifying build_top_performers preserves 0.0 values

Closes #3793, Closes #3580

Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>

---------

Signed-off-by: Shoumi <shoumimukherjee@gmail.com>
Signed-off-by: Mihai Criveti <crivetimihai@gmail.com>
Co-authored-by: Mihai Criveti <crivetimihai@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working MUST P1: Non-negotiable, critical requirements without which the product is non-functional or unsafe observability Observability, logging, monitoring ui User Interface

Projects

None yet

2 participants