Skip to content

Add opt-in diagnostics endpoint and ui.diagnostics_view element#5867

Draft
Denubis wants to merge 1 commit intozauberzeug:mainfrom
Denubis:diagnostics-5660
Draft

Add opt-in diagnostics endpoint and ui.diagnostics_view element#5867
Denubis wants to merge 1 commit intozauberzeug:mainfrom
Denubis:diagnostics-5660

Conversation

@Denubis
Copy link
Copy Markdown
Contributor

@Denubis Denubis commented Mar 11, 2026

Motivation

When debugging NiceGUI applications, runtime state (asyncio tasks, memory usage, client connections, server configuration) requires ad-hoc inspection code.

This adds an opt-in diagnostics=True parameter to ui.run() that registers a JSON endpoint at /_nicegui/diagnostics, and a ui.diagnostics_view() element for in-page display.

We have not pre-discussed whether ui.diagnostics_view belongs in core (per CONTRIBUTING.md step 1 for new elements). Happy to move it to a separate package if you'd prefer. (Noting that this is stemming from the discussion in #5660.) I renamed it to diagnostics from "health" due to a bit of concept mismatch.

Implementation

  • diagnostics field on AppConfig, threaded through ui.run()add_run_config()
  • nicegui/diagnostics.py: collect_snapshot() returns task counts grouped by coroutine, memory metrics (RSS from /proc/self/status and resource.getrusage), per-client detail (element count, outbox queue
    lengths, socket status), and server config (async_handlers, transports, reconnect_timeout, binding_refresh_interval)
  • /_nicegui/diagnostics endpoint registered only when diagnostics=True; module not imported otherwise
  • ui.diagnostics_view() composes Log + Button + optional Timer, calls collect_snapshot() directly (no HTTP request)
  • user_simulation gains **run_kwargs forwarding to ui.run() to enable testing with diagnostics=True
  • Memory metrics degrade gracefully on non-Linux/non-POSIX platforms via ImportError and contextlib.suppress(OSError) guards
  • 15 diagnostics tests + 2 user_simulation validation tests, all using User fixture

Progress

  • I chose a meaningful title that completes the sentence: "If applied, this PR will..."
  • The implementation is complete. (Nope, this is phase 1 draft, just so that @evnchn can comment)
  • If this PR addresses a security issue, it has been coordinated via the security advisory process.
  • Pytests have been added (or are not necessary).
  • Documentation has been added (or is not necessary).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@falkoschindler falkoschindler added this to the 3.x milestone Mar 11, 2026
@falkoschindler falkoschindler added feature Type/scope: New or intentionally changed behavior in progress Status: Someone is working on it labels Mar 11, 2026
Copy link
Copy Markdown
Collaborator

@evnchn evnchn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing stuff out there, but:

  1. Where do we pass the kwargs for user fixture? Can it mirror screen fixture?
  2. Keep client_id secret. Global-scope diagnostic should be admin-only, unless we have a censored public-facing global view.
  3. Considering that the diagnostics may bring performance cost, we should consider partial deployment. Maybe diagnostics can be True for all, or a list of string/object/functions?

@asynccontextmanager
async def user_simulation(
root: Callable | None = None, *, main_file: str | os.PathLike | None = None,
root: Callable | None = None, *, main_file: str | os.PathLike | None = None, **run_kwargs: Any,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but should we mirror the screen.ui_run_kwargs API? Can we do user.ui_run_kwargs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Human note: I'm reading it as: "Can we get these test fixtures to be able to call diagnostics=True" and "Can we maintain invocation 'spelling consistency' with ui_run_kwargs?" (This is like the 4th draft of this response)

Claude was... hesitant

❯ ... ok, and thus, from staring at this, can we not also pass diagnostics=True to user, such that we can get snapshots as the user output is running? This seems incredibly useful?
● Oh. That's a much better read of evnchn's comment than what I had.
Right now the User fixture (as opposed to user_simulation) is created via a pytest fixture in conftest. Let me check how that works.
● Searched for 1 pattern (ctrl+o to expand)
● There it is. user_plugin.py:23-26 — the user pytest fixture calls user_simulation(main_file=...) but has no way to pass **run_kwargs through. So if you're writing a test with the user fixture (the standard way most upstream tests work), you can't pass diagnostics=True.
That's why our tests use user_simulation directly instead of the user fixture — we had to bypass the fixture to get kwargs through.
evnchn is saying: shouldn't the user fixture itself support this? Like screen.ui_run_kwargs, there should be a way to configure ui.run() kwargs when using the standard user fixture.
The fix would be something like adding ui_run_kwargs support to the user fixture in user_plugin.py. But that's a change to NiceGUI's public testing API, not just our diagnostics code. It's arguably a separate PR — or it's something the maintainers would want to own.

Is this what you were asking?

Prior in that, Claude's question was:

So the real question from evnchn might be: should we unify the spelling? E.g. should user_simulation also accept a dict, or should Screen also accept **kwargs? Or is the current state fine because both paths support arbitrary ui.run() kwargs?

(Sorry, just waking up) I think I (the human) am also rather confused here.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering selective diagnostic deployment, is ui.diagnostics.XXXX() composable UI elements a better idea?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost certainly. I just dumped everything in a log because that way we can see it. This isn't a useful actual end point for diagnostics. Do you want me to work through rendering these in their own ui elements in this round of working through? (I started out having no UI at all, but added a ui.log wrap at the last minute. I'm more worried that these diagnostics are representing things you care about first.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evnchn as an example of how I'm using this code right now, https://github.com/MQFacultyOfArts/PromptGrimoireTool/blob/main/src/promptgrimoire/diagnostics.py (running into a memory leak somewhere, and I'm using this tooling for a bandaid)


@doc.demo('Global Scope', '''
By default the diagnostics view shows per-client detail (element count, socket status).
Set ``scope='global'`` to show a server-wide summary (total clients, connected count) instead.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering security, since client_id is a secret, I think we need to keep the global-scope diagnostic view behind password auth.

Ref: https://nicegui.io/documentation/section_security#client-side_secrets

Copy link
Copy Markdown
Contributor Author

@Denubis Denubis Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this is ... dev mode, but yep. Didn't really have a sense of how to shuffle things around. Any cues for password auth and I'll drop them in. Specifically I'm not sure how to have an admin... check without some sort of auth infra. For me it's the "have you turned diagnostics on by keyword? (But agreed that that's flimsy and footgun.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current brainstorming:

  1. diagnostics=False (default): no endpoint, no exposure
  2. diagnostics=True, no token: endpoint active, but denylist applied — safe for dev, no accidental leaks if someone forgets to lock it down in prod
  3. diagnostic_token='...': implies diagnostics=True, full unredacted response when ?token=... matches, denylist response otherwise

The denylist would strip from the default response:

  • client_id values (verbose client list, per-client detail)
  • Task names in by_coroutine (could reveal route/handler names)
  • Outbox queue details (outbox_pending_updates, outbox_pending_messages)

Still exposed in denylist mode (aggregate, non-identifying):

  • Task totals
  • Memory RSS
  • Client counts (total/connected)
  • Config values (transports, reconnect_timeout, etc.)

Footgun protection. Hopefully the secret is an env var. Thoughts?

Comment thread nicegui/diagnostics.py
Comment on lines +139 to +140
client_id = request.query_params.get('client_id')
verbose = request.query_params.get('verbose', '').lower() in ('true', '1', 'yes')
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this means the FastAPI docs will become generic, which is not ideal considering nowadays the LLMs hinges on openapi.json alot.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing the buck on this one:

There's a timing problem. The endpoint_documentation loop runs in ui.run() at ui_run.py:184. Our add_route runs in the startup handler at nicegui.py:143, which fires after ui.run() completes. So the include_in_schema loop never sees our route.

So evnchn is right — our endpoint won't appear in openapi.json even with endpoint_documentation='internal', because the route is added too late.

But — this is also true for the favicon routes at nicegui.py:136-140. They use add_route in the same startup handler and have the same timing issue. So this isn't a problem we introduced; it's an existing pattern in the codebase.

I hate the expletive "but it was always there" slides Claude loves to do. ::sigh:: But yeah, passing the buck here as a human.

That said, there are two separate questions:

  1. Does the route appear in OpenAPI? No, because of the timing issue. But neither do favicons. To fix it, we'd either move the add_route earlier (before ui.run()'s loop) or use @app.get() with include_in_schema controlled explicitly.
  2. Is the response typed? No — get_diagnostics returns a plain JSONResponse(dict). To get a typed schema in OpenAPI, we'd need a Pydantic response model. That's additional work.

The endpoint follows the same add_route pattern as other /_nicegui/* routes (nicegui.py:136-140). It currently won't appear in OpenAPI regardless, due to startup timing. Adding OpenAPI visibility and a typed response model is a valid improvement but arguably a separate concern — happy to add it if the maintainers want it in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Type/scope: New or intentionally changed behavior in progress Status: Someone is working on it

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants