Fix lost logcontext when using timeout_deferred(...)#19090
Fix lost logcontext when using timeout_deferred(...)#19090MadLittleMods merged 21 commits intodevelopfrom
timeout_deferred(...)#19090Conversation
This reverts commit 4b9441e.
| with PreserveLoggingContext(): | ||
| deferred.cancel() |
There was a problem hiding this comment.
This is the main fix!
See the Deferred callbacks section of our logcontext docs for more info (specifically using solution 2).
Heads-up, I wrote the docs too so it's my assumptions/understanding all the way down. Apply your own scrutiny.
| with PreserveLoggingContext(): | ||
| deferred.cancel() |
There was a problem hiding this comment.
In an ideal world, I think it should be possible to call the deferred callback/errbacks/cancel with some logcontext. I spent too much time trying to figure out the intricacies here and trying to use solution 3 from the Deferred callbacks docs but wasn't successful.
It makes me question if what I wrote there is correct in the first place 🤔
The problem is that calling deferred.cancel() can change the logcontext but it's also complete at that point. Because the deferred is already complete, run_in_background(lambda: (deferred.cancel(), deferred)[1]) assumes that the logcontext was unchanged and returns it as-is which leaves the logcontext is messed up for the caller.
I can't tell if the problem is a) our function just isn't following logcontext rules (and how to resolve that well) or b) we should set_current_context(calling_context) regardless of whether the deferred is already complete.
Instead of banging my head against this more, I've opted to go for the simple route with PreserveLoggingContext():. Which also matches what we do elsewhere in the codebase. Something to improve in the future ⏩
There was a problem hiding this comment.
I think I somewhat understand the problem, but don't really understand it enough to give you an answer here.
If you ever want to jump on a call to try and talk this through (or even rubber-duck it with me) shout and I'd be happy to :)
…ut-deferred-call-later
anoadragon453
left a comment
There was a problem hiding this comment.
Thanks for writing a regression test - it and the fix look sound to me.
As I said below, I just about understand logcontexts, but I think I struggle to apply the rules when reading arbitrary code. If anything I appreciate the guardrails we add to warn/raise when breaking the rules. Perhaps adding yet more of those would help determine whether timeout_deferred / time_it_out is doing things correctly?
| with PreserveLoggingContext(): | ||
| deferred.cancel() |
There was a problem hiding this comment.
I think I somewhat understand the problem, but don't really understand it enough to give you an answer here.
If you ever want to jump on a call to try and talk this through (or even rubber-duck it with me) shout and I'd be happy to :)
|
Thanks for the review @anoadragon453 🦢 |
Same fix as #19090 Spawning from working on clean tenant deprovisioning in the Synapse Pro for small hosts project (element-hq/synapse-small-hosts#204).
Fix lost logcontext when using
timeout_deferred(...)and things actually timeout.Fix #19087 (our HTTP client times out requests using
timeout_deferred(...)Fix #19066 (
/syncusesnotifier.wait_for_events()which usestimeout_deferred(...)under the hood)When/why did these lost logcontext warnings start happening?
In #18828, we switched
timeout_deferred(...)from usingreactor.callLater(...)toclock.call_later(...)under the hood. This meant it started dealing with logcontexts but ourtime_it_out()callback didn't follow our Synapse logcontext rules.Dev notes
Pull Request Checklist
EventStoretoEventWorkerStore.".code blocks.