Skip to content

Fix lost logcontext when using timeout_deferred(...)#19090

Merged
MadLittleMods merged 21 commits intodevelopfrom
madlittlemods/19087-logcontext-lost-timeout-deferred-call-later
Oct 30, 2025
Merged

Fix lost logcontext when using timeout_deferred(...)#19090
MadLittleMods merged 21 commits intodevelopfrom
madlittlemods/19087-logcontext-lost-timeout-deferred-call-later

Conversation

@MadLittleMods
Copy link
Copy Markdown
Contributor

@MadLittleMods MadLittleMods commented Oct 22, 2025

Fix lost logcontext when using timeout_deferred(...) and things actually timeout.

Fix #19087 (our HTTP client times out requests using timeout_deferred(...)
Fix #19066 (/sync uses notifier.wait_for_events() which uses timeout_deferred(...) under the hood)

When/why did these lost logcontext warnings start happening?

synapse.logging.context - 107 - WARNING - sentinel - Expected logging context call_later but found POST-2453

synapse.logging.context - 107 - WARNING - sentinel - Expected logging context call_later was lost

In #18828, we switched timeout_deferred(...) from using reactor.callLater(...) to clock.call_later(...) under the hood. This meant it started dealing with logcontexts but our time_it_out() callback didn't follow our Synapse logcontext rules.

Dev notes

SYNAPSE_TEST_LOG_LEVEL=DEBUG poetry run trial tests.util.test_async_helpers.TimeoutDeferredTest.test_logcontext_is_not_lost_when_awaiting_on_timeout_cancellation
SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.rest.client.sliding_sync.test_sliding_sync.SlidingSyncTestCase_new.test_wait_for_new_data_timeout

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

Comment thread synapse/logging/context.py
Comment on lines +815 to +816
with PreserveLoggingContext():
deferred.cancel()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main fix!

See the Deferred callbacks section of our logcontext docs for more info (specifically using solution 2).

Heads-up, I wrote the docs too so it's my assumptions/understanding all the way down. Apply your own scrutiny.

Comment thread synapse/util/clock.py
Comment thread tests/unittest.py
Comment thread tests/util/test_async_helpers.py
Comment on lines +815 to +816
with PreserveLoggingContext():
deferred.cancel()
Copy link
Copy Markdown
Contributor Author

@MadLittleMods MadLittleMods Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an ideal world, I think it should be possible to call the deferred callback/errbacks/cancel with some logcontext. I spent too much time trying to figure out the intricacies here and trying to use solution 3 from the Deferred callbacks docs but wasn't successful.

It makes me question if what I wrote there is correct in the first place 🤔


The problem is that calling deferred.cancel() can change the logcontext but it's also complete at that point. Because the deferred is already complete, run_in_background(lambda: (deferred.cancel(), deferred)[1]) assumes that the logcontext was unchanged and returns it as-is which leaves the logcontext is messed up for the caller.

I can't tell if the problem is a) our function just isn't following logcontext rules (and how to resolve that well) or b) we should set_current_context(calling_context) regardless of whether the deferred is already complete.

Instead of banging my head against this more, I've opted to go for the simple route with PreserveLoggingContext():. Which also matches what we do elsewhere in the codebase. Something to improve in the future ⏩

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I somewhat understand the problem, but don't really understand it enough to give you an answer here.

If you ever want to jump on a call to try and talk this through (or even rubber-duck it with me) shout and I'd be happy to :)

@MadLittleMods MadLittleMods marked this pull request as ready for review October 22, 2025 16:16
@MadLittleMods MadLittleMods requested a review from a team as a code owner October 22, 2025 16:16
Copy link
Copy Markdown
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing a regression test - it and the fix look sound to me.

As I said below, I just about understand logcontexts, but I think I struggle to apply the rules when reading arbitrary code. If anything I appreciate the guardrails we add to warn/raise when breaking the rules. Perhaps adding yet more of those would help determine whether timeout_deferred / time_it_out is doing things correctly?

Comment thread synapse/logging/context.py Outdated
Comment on lines +815 to +816
with PreserveLoggingContext():
deferred.cancel()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I somewhat understand the problem, but don't really understand it enough to give you an answer here.

If you ever want to jump on a call to try and talk this through (or even rubber-duck it with me) shout and I'd be happy to :)

@MadLittleMods MadLittleMods merged commit c0b9437 into develop Oct 30, 2025
44 checks passed
@MadLittleMods MadLittleMods deleted the madlittlemods/19087-logcontext-lost-timeout-deferred-call-later branch October 30, 2025 16:49
@MadLittleMods
Copy link
Copy Markdown
Contributor Author

Thanks for the review @anoadragon453 🦢

MadLittleMods added a commit that referenced this pull request Nov 3, 2025
Same fix as #19090

Spawning from working on clean tenant deprovisioning in the Synapse Pro
for small hosts project
(element-hq/synapse-small-hosts#204).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lost logcontext with antispam module Losting logging context on sync request

2 participants