Skip to content

Fix bad deferred logcontext handling#19180

Merged
MadLittleMods merged 12 commits intodevelopfrom
madlittlemods/fix-bad-deferred-logcontext-handling
Nov 14, 2025
Merged

Fix bad deferred logcontext handling#19180
MadLittleMods merged 12 commits intodevelopfrom
madlittlemods/fix-bad-deferred-logcontext-handling

Conversation

@MadLittleMods
Copy link
Copy Markdown
Contributor

@MadLittleMods MadLittleMods commented Nov 13, 2025

Fix bad deferred logcontext handling

These aren't really something personally experienced but I just went around the codebase looking for all of the Deferred .callback, .errback, and .cancel and wrapped them with PreserveLoggingContext()

Spawning from wanting to solve #19165 but unconfirmed whether this has any effect.

To explain the fix, see the Deferred callbacks section of our logcontext docs for more info (specifically using solution 2).

Heads-up, I wrote the docs too so it's my assumptions/understanding all the way down. Apply your own scrutiny.

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

@MadLittleMods MadLittleMods force-pushed the madlittlemods/fix-bad-deferred-logcontext-handling branch from 95de67d to e2a0891 Compare November 13, 2025 17:24
@MadLittleMods MadLittleMods force-pushed the madlittlemods/fix-bad-deferred-logcontext-handling branch from e2a0891 to d496cd0 Compare November 13, 2025 17:28
deferred=stop_cancellation(notify_deferred),
timeout=delay_ms / 1000,
clock=self._clock,
)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same fix as #19146

Comment thread changelog.d/19180.misc
@@ -0,0 +1 @@
Fix bad deferred logcontext handling across the codebase.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on how often and easy it is to get this wrong, ideally, we'd have some lint for this. But not for now ⏩

@MadLittleMods MadLittleMods force-pushed the madlittlemods/fix-bad-deferred-logcontext-handling branch from d496cd0 to 13dd110 Compare November 13, 2025 17:57
Comment thread synapse/http/client.py
Comment on lines +1043 to +1044
with PreserveLoggingContext():
self.deferred.errback(BodyExceededMaxSize())
Copy link
Copy Markdown
Contributor Author

@MadLittleMods MadLittleMods Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I painted with a wide brush, I think this is necessary because the deferred is being passed in and could have whatever callback chain which could change logcontext.

In practice, I don't think these really had callback chains to worry about which is why we probably didn't experience problems in this area before.

Applies with a bunch of the Twisted HTTP machinery type stuff here.

deferred=stop_cancellation(notify_deferred),
timeout=delay_ms / 1000,
clock=self._clock,
)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bunch of smoke here that fixing AwakenableSleeper may fix #19165

PerDestinationQueue (federation_transaction_transmission_loop in the logs) may be intertwined (not sure, not following the layers quite concretely) with MatrixFederationHttpClient which uses AwakenableSleeper

Comment on lines +816 to +817
with PreserveLoggingContext():
new_d.errback(defer.TimeoutError("Timed out after %gs" % (timeout,)))
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems necessary as we return new_d and people can chain whatever on top of it.

Usually, there are things chained that change the logcontext because of this pattern: await make_deferred_yieldable(timeout_deferred(...)) (make_deferred_yieldable changes the logcontext)

@MadLittleMods MadLittleMods marked this pull request as ready for review November 13, 2025 20:22
@MadLittleMods MadLittleMods requested a review from a team as a code owner November 13, 2025 20:22
Copy link
Copy Markdown
Member

@devonh devonh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything in the PR is as per previous logcontext fixes, including following similar patterns as #19146.
It would be nice to have a linter in the future to catch raw deferred handling like this as you mentioned.

Hopefully these changes can cut down on some of the leaked logcontexts going forward. We'll have to just merge this in and run it in the wild for a while to see if it makes any difference in practice.

@MadLittleMods MadLittleMods merged commit edc0de9 into develop Nov 14, 2025
78 of 80 checks passed
@MadLittleMods MadLittleMods deleted the madlittlemods/fix-bad-deferred-logcontext-handling branch November 14, 2025 17:21
@MadLittleMods
Copy link
Copy Markdown
Contributor Author

Thanks for the review @devonh 🐪

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants