Store the LoggingContext in a ContextVar#18871
Store the LoggingContext in a ContextVar#18871MadLittleMods wants to merge 9 commits intodevelopfrom
LoggingContext in a ContextVar#18871Conversation
| # TODO: This function is a no-op now and should be removed in a follow-up PR. | ||
| def make_deferred_yieldable(deferred: "defer.Deferred[T]") -> "defer.Deferred[T]": |
There was a problem hiding this comment.
make_deferred_yieldable no longer does anything (no-op) but there a lot of references to clean-up. I think it would be better to do this in a follow-up PR than bulk up with this diff with changes that will cloud the main change we're trying to introduce.
| Ideally, nothing from the Synapse homeserver would be logged against the `sentinel` | ||
| context as we want to know where the logs came from. In practice, this is not always the | ||
| case yet especially outside of request handling. |
There was a problem hiding this comment.
Over time, we can remove PreserveLoggingContext from many scenarios that cause the sentinel context to be used.
I've already started this separately in #18870
| @@ -1,250 +0,0 @@ | |||
| # | |||
There was a problem hiding this comment.
As far as I can tell, these checks provide no value to us anymore. We don't have specific log rules to worry about anymore and the ContextVar properly follows the context regardless.
| self.assertEqual( | ||
| current_context(), | ||
| SENTINEL_CONTEXT, | ||
| c1, |
There was a problem hiding this comment.
As far as I can tell, these new context values make sense. We're in context c1, so current_context() should be c1.
| if bool(os.environ.get("SYNAPSE_TEST_PATCH_LOG_CONTEXTS", False)): | ||
| # We import here so that we don't have to install a bunch of deps when | ||
| # running the packaging tox test. | ||
| from synapse.util.patch_inline_callbacks import do_patch | ||
|
|
||
| do_patch() |
There was a problem hiding this comment.
See #18871 (comment) for why we've removed the patch_inline_callbacks
When we `daemonize`, we fork the process and cputime metrics get confused
about the per-thread resource usage appearing to go backwards because we're
comparing the resource usage (`rusage`) from the original process to the
forked process.
We now kick off the background tasks (`run_as_background_process`) after we
have forked the process so the `rusage` we record when we `start` is in the
same thread when we `stop`.
Bad log examples from before:
```
synapse.logging.context - ERROR - _schedule_next_expiry-0 - utime went backwards! 0.050467 < 0.886526
synapse.logging.context - ERROR - _schedule_db_events-0 - stime went backwards! 0.009941 < 0.155106
synapse.logging.context - ERROR - wake_destinations_needing_catchup-0 - stime went backwards! 0.010175 < 0.130923
synapse.logging.context - ERROR - resume_sync_partial_state_room-0 - utime went backwards! 0.052898 < 0.886526
```
Testing strategy:
1. Run with `daemonize: true` in your `homeserver.yaml`
1. `poetry run synapse_homeserver --config-path homeserver.yaml`
1. Shutdown the server
1. Look for any bad log entries in your homeserver logs:
- `Expected logging context sentinel but found main`
- `Expected logging context main was lost`
- `utime went backwards!`/`stime went backwards!`
| return _current_context.get(SENTINEL_CONTEXT) | ||
|
|
||
|
|
||
| def set_current_context(context: LoggingContextOrSentinel) -> LoggingContextOrSentinel: |
There was a problem hiding this comment.
TODO: The docstring needs to be updated
|
|
||
| _thread_local = threading.local() | ||
| _thread_local.current_context = SENTINEL_CONTEXT | ||
| _current_context: ContextVar[LoggingContextOrSentinel] = ContextVar("current_context") |
There was a problem hiding this comment.
Error: Called stop on logcontext POST-0 without recording a start rusage
There is a problem where the POST-0 LoggingContext is somehow becoming the current context without a corresponding set_current_context(POST-0) call.
See the lines marked red in the snippet below.
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.rest.client.test_rooms.RoomStateTestCase.test_get_state_event_cancellation
_trial_temp/test.log
2025-09-02 19:12:06-0500 [-] synapse.http.site - 304 - INFO - sentinel - asdf SynapseRequest render
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 662 - INFO - sentinel - asdf PreserveLoggingContext(POST-0).__enter__ nonce=meZAG
+ 2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(POST-0) (previous=sentinel)
+ 2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - POST-0 - asdf LoggingContext(POST-0).start usage_start=True
+ 2025-09-02 19:12:06-0500 [-] synapse.logging.context - 675 - INFO - POST-0 - asdf PreserveLoggingContext(POST-0).__exit__ nonce=meZAG restoring old_context=sentinel
+ 2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - POST-0 - asdf set_current_context(sentinel) (previous=POST-0)
+ 2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - POST-0 - asdf LoggingContext(POST-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 662 - INFO - sentinel - asdf PreserveLoggingContext(sentinel).__enter__ nonce=xQsLm
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(sentinel) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(_handle_new_device_update_async-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - _handle_new_device_update_async-0 - asdf LoggingContext(_handle_new_device_update_async-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 675 - INFO - sentinel - asdf PreserveLoggingContext(sentinel).__exit__ nonce=xQsLm restoring old_context=sentinel
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(sentinel) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-POST-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-POST-0 - asdf set_current_context(sentinel) (previous=db-POST-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-_handle_new_device_update_async-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-_handle_new_device_update_async-0 - asdf LoggingContext(db-_handle_new_device_update_async-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-_handle_new_device_update_async-0 - asdf set_current_context(sentinel) (previous=db-_handle_new_device_update_async-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-_handle_new_device_update_async-0 - asdf LoggingContext(db-_handle_new_device_update_async-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-POST-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-POST-0 - asdf set_current_context(sentinel) (previous=db-POST-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-_handle_new_device_update_async-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-_handle_new_device_update_async-0 - asdf LoggingContext(db-_handle_new_device_update_async-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-_handle_new_device_update_async-0 - asdf set_current_context(sentinel) (previous=db-_handle_new_device_update_async-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-_handle_new_device_update_async-0 - asdf LoggingContext(db-_handle_new_device_update_async-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-POST-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-POST-0 - asdf set_current_context(sentinel) (previous=db-POST-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - _handle_new_device_update_async-0 - asdf set_current_context(sentinel) (previous=_handle_new_device_update_async-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - _handle_new_device_update_async-0 - asdf LoggingContext(_handle_new_device_update_async-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-POST-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-POST-0 - asdf set_current_context(sentinel) (previous=db-POST-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-POST-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-POST-0 - asdf set_current_context(sentinel) (previous=db-POST-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-POST-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-POST-0 - asdf set_current_context(sentinel) (previous=db-POST-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-POST-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-POST-0 - asdf set_current_context(sentinel) (previous=db-POST-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-POST-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-POST-0 - asdf set_current_context(sentinel) (previous=db-POST-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-POST-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-POST-0 - asdf set_current_context(sentinel) (previous=db-POST-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).stop usage_start=True rusage=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - sentinel - asdf set_current_context(db-POST-0) (previous=sentinel)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 454 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).start usage_start=True
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - db-POST-0 - asdf set_current_context(sentinel) (previous=db-POST-0)
2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - db-POST-0 - asdf LoggingContext(db-POST-0).stop usage_start=True rusage=True
- 2025-09-02 19:12:06-0500 [-] synapse.logging.context - 662 - INFO - POST-0 - asdf PreserveLoggingContext(sentinel).__enter__ nonce=JvLjU
- 2025-09-02 19:12:06-0500 [-] synapse.logging.context - 723 - INFO - POST-0 - asdf set_current_context(sentinel) (previous=POST-0)
- 2025-09-02 19:12:06-0500 [-] synapse.logging.context - 471 - INFO - POST-0 - asdf LoggingContext(POST-0).stop usage_start=False rusage=True
- 2025-09-02 19:12:06-0500 [-] synapse.logging.context - 488 - ERROR - POST-0 - asdf Called stop on logcontext POST-0 without recording a start rusageThere was a problem hiding this comment.
Perhaps this is a Twisted bug?
I was under the impression that Twisted supported ContextVar's but now I'm not sure. All of the issues mentioned in matrix-org/synapse#10342 are resolved but there are other things in the Twisted tracker:
Unresolved issues:
- Support contextvars in Deferred twisted/twisted#9807
- Support contextvars in DelayedCall twisted/twisted#9824
Resolved issues:
- Support Contextvars in coroutines (inlineCallbacks/ensureDeferred) twisted/twisted#9719
- Contextvars support does not support
.resettwisted/twisted#10301
And we could even be running into something unreported 🤷 Need to investigate more.
There was a problem hiding this comment.
From some more debugging, I think the ContextVar is acting normally.
And this may just be the case that the LoggingContext start/stop pattern isn't compatible with the ContextVar we're using now. We'd have to maintain the log context rules 🤔.
In this case, it's the SynapseRequest.logcontext where stop is called because we have a PreserveLoggingContext around SynapseRequest.render which only kicks off the render and doesn't wait for it to finish so we stop way before the request is done. And it's never re-started. So when other LoggingContext utilities are used in the downstream code to set_current_context, it will stop the already stopped SynapseRequest.logcontext.
I go back and forth on whether we can update things to work correctly. If I naively try to manage the lifetime myself by calling self.logcontext.__enter__ manually in SynapseRequest.render, it still doesn't work out.
| # Register background tasks required by this server. This must be done | ||
| # somewhat manually due to the background tasks not being registered | ||
| # unless handlers are instantiated. | ||
| if hs.config.worker.run_background_tasks: | ||
| hs.start_background_tasks() |
There was a problem hiding this comment.
Split out this change to #18886 since it seems good in any case
And this PR may get stale
Spawning from #18871 [This change](6ce2f3e) was originally used to fix CPU time going backwards when we `daemonize`. While, we don't seem to run into this problem on `develop`, I still think this is a good change to make. We don't need background tasks running on a process that will soon be forcefully exited and where the reactor isn't even running yet. We now kick off the background tasks (`run_as_background_process`) after we have forked the process and started the reactor. Also as simple note, we don't need background tasks running in both halves of a fork.
|
Closing as I've decided to continue trudging in the |
Spawning from element-hq/synapse#18871 [This change](element-hq/synapse@6ce2f3e) was originally used to fix CPU time going backwards when we `daemonize`. While, we don't seem to run into this problem on `develop`, I still think this is a good change to make. We don't need background tasks running on a process that will soon be forcefully exited and where the reactor isn't even running yet. We now kick off the background tasks (`run_as_background_process`) after we have forked the process and started the reactor. Also as simple note, we don't need background tasks running in both halves of a fork.
This is a first step towards
ContextVarbasedLoggingContext. This PR only goes as far as to store theLoggingContextin aContextVarinstead of thread-local. But this still gives us the benefit of being able to remove the painful log context rule complexity around needing to make sure the thread-local is set correctly as awaitables are suspended and resumed in the Twisted reactor.Part of #10342 (previously matrix-org/synapse#10342)
This is purely based on @sandhose's branch which I've just picked up, kicked the tires, and brought forward to propose and merge.
This is spawning from adding
server_nameto theLoggingContextand finding that we use thesentinelLoggingContextin many places (which means theserver_nameisn't tracked in those places). After removing thesentinelLoggingContextfrom a few places, it uncovered some places where we don't seem to be following the log context rules so things are getting messed up. Instead of trying to adapt a bunch of tricky areas to follow the rules, I decided to just try removing the need for the log context rules and just refactor to theContextVarbasedLoggingContext.Testing strategy
daemonize: true:poetry run synapse_homeserver --config-path homeserver.yamlExpected logging context sentinel but found mainExpected logging context main was lostExpected previous contextutime went backwards!/stime went backwards!Called stop on logcontext POST-0 without recording a start rusageTodo
docs/log_contexts.mdtests/util/caches/test_descriptors.py(synapse/util/patch_inline_callbacks.py)synapse/rust/src/http_client.rs
Lines 247 to 251 in c68c5dd
Future:
PreserveLoggingContextfrom many scenariossentinellogcontext where we log insetup,startand exit #18870make_deferred_yieldableDev notes
Pull Request Checklist
EventStoretoEventWorkerStore.".code blocks.