Add opt-in diagnostics endpoint and ui.diagnostics_view element#5867
Add opt-in diagnostics endpoint and ui.diagnostics_view element#5867Denubis wants to merge 1 commit intozauberzeug:mainfrom
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7e3ec75 to
0473e43
Compare
evnchn
left a comment
There was a problem hiding this comment.
Throwing stuff out there, but:
- Where do we pass the kwargs for user fixture? Can it mirror screen fixture?
- Keep client_id secret. Global-scope diagnostic should be admin-only, unless we have a censored public-facing global view.
- Considering that the diagnostics may bring performance cost, we should consider partial deployment. Maybe
diagnosticscan be True for all, or a list of string/object/functions?
| @asynccontextmanager | ||
| async def user_simulation( | ||
| root: Callable | None = None, *, main_file: str | os.PathLike | None = None, | ||
| root: Callable | None = None, *, main_file: str | os.PathLike | None = None, **run_kwargs: Any, |
There was a problem hiding this comment.
Not sure, but should we mirror the screen.ui_run_kwargs API? Can we do user.ui_run_kwargs?
There was a problem hiding this comment.
Human note: I'm reading it as: "Can we get these test fixtures to be able to call diagnostics=True" and "Can we maintain invocation 'spelling consistency' with ui_run_kwargs?" (This is like the 4th draft of this response)
Claude was... hesitant
❯ ... ok, and thus, from staring at this, can we not also pass diagnostics=True to user, such that we can get snapshots as the user output is running? This seems incredibly useful?
● Oh. That's a much better read of evnchn's comment than what I had.
Right now the User fixture (as opposed to user_simulation) is created via a pytest fixture in conftest. Let me check how that works.
● Searched for 1 pattern (ctrl+o to expand)
● There it is. user_plugin.py:23-26 — the user pytest fixture calls user_simulation(main_file=...) but has no way to pass **run_kwargs through. So if you're writing a test with the user fixture (the standard way most upstream tests work), you can't pass diagnostics=True.
That's why our tests use user_simulation directly instead of the user fixture — we had to bypass the fixture to get kwargs through.
evnchn is saying: shouldn't the user fixture itself support this? Like screen.ui_run_kwargs, there should be a way to configure ui.run() kwargs when using the standard user fixture.
The fix would be something like adding ui_run_kwargs support to the user fixture in user_plugin.py. But that's a change to NiceGUI's public testing API, not just our diagnostics code. It's arguably a separate PR — or it's something the maintainers would want to own.
Is this what you were asking?
Prior in that, Claude's question was:
So the real question from evnchn might be: should we unify the spelling? E.g. should user_simulation also accept a dict, or should Screen also accept **kwargs? Or is the current state fine because both paths support arbitrary ui.run() kwargs?
(Sorry, just waking up) I think I (the human) am also rather confused here.
There was a problem hiding this comment.
Considering selective diagnostic deployment, is ui.diagnostics.XXXX() composable UI elements a better idea?
There was a problem hiding this comment.
Almost certainly. I just dumped everything in a log because that way we can see it. This isn't a useful actual end point for diagnostics. Do you want me to work through rendering these in their own ui elements in this round of working through? (I started out having no UI at all, but added a ui.log wrap at the last minute. I'm more worried that these diagnostics are representing things you care about first.
There was a problem hiding this comment.
@evnchn as an example of how I'm using this code right now, https://github.com/MQFacultyOfArts/PromptGrimoireTool/blob/main/src/promptgrimoire/diagnostics.py (running into a memory leak somewhere, and I'm using this tooling for a bandaid)
|
|
||
| @doc.demo('Global Scope', ''' | ||
| By default the diagnostics view shows per-client detail (element count, socket status). | ||
| Set ``scope='global'`` to show a server-wide summary (total clients, connected count) instead. |
There was a problem hiding this comment.
Considering security, since client_id is a secret, I think we need to keep the global-scope diagnostic view behind password auth.
Ref: https://nicegui.io/documentation/section_security#client-side_secrets
There was a problem hiding this comment.
Sure, this is ... dev mode, but yep. Didn't really have a sense of how to shuffle things around. Any cues for password auth and I'll drop them in. Specifically I'm not sure how to have an admin... check without some sort of auth infra. For me it's the "have you turned diagnostics on by keyword? (But agreed that that's flimsy and footgun.)
There was a problem hiding this comment.
Current brainstorming:
- diagnostics=False (default): no endpoint, no exposure
- diagnostics=True, no token: endpoint active, but denylist applied — safe for dev, no accidental leaks if someone forgets to lock it down in prod
- diagnostic_token='...': implies diagnostics=True, full unredacted response when ?token=... matches, denylist response otherwise
The denylist would strip from the default response:
- client_id values (verbose client list, per-client detail)
- Task names in by_coroutine (could reveal route/handler names)
- Outbox queue details (outbox_pending_updates, outbox_pending_messages)
Still exposed in denylist mode (aggregate, non-identifying):
- Task totals
- Memory RSS
- Client counts (total/connected)
- Config values (transports, reconnect_timeout, etc.)
Footgun protection. Hopefully the secret is an env var. Thoughts?
| client_id = request.query_params.get('client_id') | ||
| verbose = request.query_params.get('verbose', '').lower() in ('true', '1', 'yes') |
There was a problem hiding this comment.
I think this means the FastAPI docs will become generic, which is not ideal considering nowadays the LLMs hinges on openapi.json alot.
There was a problem hiding this comment.
Passing the buck on this one:
There's a timing problem. The endpoint_documentation loop runs in ui.run() at ui_run.py:184. Our add_route runs in the startup handler at nicegui.py:143, which fires after ui.run() completes. So the include_in_schema loop never sees our route.
So evnchn is right — our endpoint won't appear in openapi.json even with endpoint_documentation='internal', because the route is added too late.
But — this is also true for the favicon routes at nicegui.py:136-140. They use add_route in the same startup handler and have the same timing issue. So this isn't a problem we introduced; it's an existing pattern in the codebase.
I hate the expletive "but it was always there" slides Claude loves to do. ::sigh:: But yeah, passing the buck here as a human.
That said, there are two separate questions:
- Does the route appear in OpenAPI? No, because of the timing issue. But neither do favicons. To fix it, we'd either move the add_route earlier (before ui.run()'s loop) or use @app.get() with include_in_schema controlled explicitly.
- Is the response typed? No — get_diagnostics returns a plain JSONResponse(dict). To get a typed schema in OpenAPI, we'd need a Pydantic response model. That's additional work.
The endpoint follows the same add_route pattern as other /_nicegui/* routes (nicegui.py:136-140). It currently won't appear in OpenAPI regardless, due to startup timing. Adding OpenAPI visibility and a typed response model is a valid improvement but arguably a separate concern — happy to add it if the maintainers want it in this PR.
Motivation
When debugging NiceGUI applications, runtime state (asyncio tasks, memory usage, client connections, server configuration) requires ad-hoc inspection code.
This adds an opt-in
diagnostics=Trueparameter toui.run()that registers a JSON endpoint at/_nicegui/diagnostics, and aui.diagnostics_view()element for in-page display.We have not pre-discussed whether
ui.diagnostics_viewbelongs in core (per CONTRIBUTING.md step 1 for new elements). Happy to move it to a separate package if you'd prefer. (Noting that this is stemming from the discussion in #5660.) I renamed it to diagnostics from "health" due to a bit of concept mismatch.Implementation
diagnosticsfield onAppConfig, threaded throughui.run()→add_run_config()nicegui/diagnostics.py:collect_snapshot()returns task counts grouped by coroutine, memory metrics (RSS from/proc/self/statusandresource.getrusage), per-client detail (element count, outbox queuelengths, socket status), and server config (
async_handlers, transports,reconnect_timeout,binding_refresh_interval)/_nicegui/diagnosticsendpoint registered only whendiagnostics=True; module not imported otherwiseui.diagnostics_view()composes Log + Button + optional Timer, callscollect_snapshot()directly (no HTTP request)user_simulationgains**run_kwargsforwarding toui.run()to enable testing withdiagnostics=TrueImportErrorandcontextlib.suppress(OSError)guardsuser_simulationvalidation tests, all usingUserfixtureProgress