Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
5eea20a
Add reproduction test for scenario 1 (not fire-and-forget)
MadLittleMods Oct 21, 2025
6be53d8
Messing around in situ
MadLittleMods Oct 21, 2025
9478194
Better logcontext names in tests
MadLittleMods Oct 21, 2025
33776c0
WIP: tests
MadLittleMods Oct 22, 2025
4b9441e
Transparent deferred callback/errback when we `_set_context_cb`
MadLittleMods Oct 22, 2025
630792c
Revert "Transparent deferred callback/errback when we `_set_context_cb`"
MadLittleMods Oct 22, 2025
73080af
More test twiddling
MadLittleMods Oct 22, 2025
5179ea3
Fix typo
MadLittleMods Oct 22, 2025
202906e
Further clean up test
MadLittleMods Oct 22, 2025
0325f97
Fix comment
MadLittleMods Oct 22, 2025
18ae299
Just settle for good enough and works
MadLittleMods Oct 22, 2025
a31eadd
Remove cruft
MadLittleMods Oct 22, 2025
71d490c
Add changelog
MadLittleMods Oct 22, 2025
8328e1f
Merge branch 'develop' into madlittlemods/19087-logcontext-lost-timeo…
MadLittleMods Oct 22, 2025
140bb59
Remove debug logs
MadLittleMods Oct 22, 2025
9aa4919
Fix unrelated lints from `develop`
MadLittleMods Oct 22, 2025
7bb2ffc
Make it more clear what's being tested
MadLittleMods Oct 22, 2025
4e576b7
Merge branch 'develop' into madlittlemods/19087-logcontext-lost-timeo…
MadLittleMods Oct 22, 2025
a782d04
Merge branch 'develop' into madlittlemods/19087-logcontext-lost-timeo…
MadLittleMods Oct 23, 2025
3100b63
Make it more clear when nothing is wrong
MadLittleMods Oct 30, 2025
0308e6b
Merge branch 'develop' into madlittlemods/19087-logcontext-lost-timeo…
MadLittleMods Oct 30, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/19090.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix lost logcontext warnings from timeouts in sync and requests made by Synapse itself.
22 changes: 16 additions & 6 deletions synapse/logging/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -896,7 +896,7 @@ def run_in_background(
# If the function messes with logcontexts, we can assume it follows the Synapse
# logcontext rules (Rules for functions returning awaitables: "If the awaitable
# is already complete, the function returns with the same logcontext it started
# with."). If it function doesn't touch logcontexts at all, we can also assume
# with."). If the function doesn't touch logcontexts at all, we can also assume
# the logcontext is unchanged.
#
# Either way, the function should have maintained the calling logcontext, so we
Expand All @@ -905,11 +905,21 @@ def run_in_background(
# to reset the logcontext to the sentinel logcontext as that would run
# immediately (remember our goal is to maintain the calling logcontext when we
# return).
logcontext_debug_logger.debug(
"run_in_background(%s): deferred already completed and the function should have maintained the logcontext %s",
instance_id,
calling_context,
)
if current_context() != calling_context:
logcontext_error(
"run_in_background(%s): deferred already completed but the function did not maintain the calling logcontext %s (found %s)"
% (
instance_id,
calling_context,
current_context(),
)
)
Comment thread
anoadragon453 marked this conversation as resolved.
else:
logcontext_debug_logger.debug(
"run_in_background(%s): deferred already completed (maintained the calling logcontext %s)",
instance_id,
calling_context,
)
return d

# Since the function we called may follow the Synapse logcontext rules (Rules for
Expand Down
3 changes: 2 additions & 1 deletion synapse/util/async_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -808,7 +808,8 @@ def time_it_out() -> None:
timed_out[0] = True

try:
deferred.cancel()
with PreserveLoggingContext():
deferred.cancel()
Comment on lines +811 to +812
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main fix!

See the Deferred callbacks section of our logcontext docs for more info (specifically using solution 2).

Heads-up, I wrote the docs too so it's my assumptions/understanding all the way down. Apply your own scrutiny.

Comment on lines +811 to +812
Copy link
Copy Markdown
Contributor Author

@MadLittleMods MadLittleMods Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an ideal world, I think it should be possible to call the deferred callback/errbacks/cancel with some logcontext. I spent too much time trying to figure out the intricacies here and trying to use solution 3 from the Deferred callbacks docs but wasn't successful.

It makes me question if what I wrote there is correct in the first place 🤔


The problem is that calling deferred.cancel() can change the logcontext but it's also complete at that point. Because the deferred is already complete, run_in_background(lambda: (deferred.cancel(), deferred)[1]) assumes that the logcontext was unchanged and returns it as-is which leaves the logcontext is messed up for the caller.

I can't tell if the problem is a) our function just isn't following logcontext rules (and how to resolve that well) or b) we should set_current_context(calling_context) regardless of whether the deferred is already complete.

Instead of banging my head against this more, I've opted to go for the simple route with PreserveLoggingContext():. Which also matches what we do elsewhere in the codebase. Something to improve in the future ⏩

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I somewhat understand the problem, but don't really understand it enough to give you an answer here.

If you ever want to jump on a call to try and talk this through (or even rubber-duck it with me) shout and I'd be happy to :)

except Exception: # if we throw any exception it'll break time outs
logger.exception("Canceller failed during timeout")

Expand Down
2 changes: 1 addition & 1 deletion synapse/util/clock.py
Original file line number Diff line number Diff line change
Expand Up @@ -266,7 +266,7 @@ def wrapped_callback(*args: Any, **kwargs: Any) -> None:
# We use `PreserveLoggingContext` to prevent our new `call_later`
# logcontext from finishing as soon as we exit this function, in case `f`
# returns an awaitable/deferred which would continue running and may try to
# restore the `loop_call` context when it's done (because it's trying to
# restore the `call_later` context when it's done (because it's trying to
Comment thread
anoadragon453 marked this conversation as resolved.
# adhere to the Synapse logcontext rules.)
#
# This also ensures that we return to the `sentinel` context when we exit
Expand Down
3 changes: 3 additions & 0 deletions tests/unittest.py
Original file line number Diff line number Diff line change
Expand Up @@ -341,6 +341,9 @@ def logcontext_clean(target: TV) -> TV:
"""

def logcontext_error(msg: str) -> NoReturn:
# Log so we can still see it in the logs like normal
logger.warning(msg)
Comment thread
anoadragon453 marked this conversation as resolved.
# But also fail the test
raise AssertionError("logcontext error: %s" % (msg))

patcher = patch("synapse.logging.context.logcontext_error", new=logcontext_error)
Expand Down
68 changes: 66 additions & 2 deletions tests/util/test_async_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
)

from tests.server import get_clock
from tests.unittest import TestCase
from tests.unittest import TestCase, logcontext_clean

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -198,7 +198,12 @@ def canceller(_d: Deferred) -> None:

self.failureResultOf(timing_out_d, defer.TimeoutError)

async def test_logcontext_is_preserved_on_cancellation(self) -> None:
@logcontext_clean
async def test_logcontext_is_preserved_on_timeout_cancellation(self) -> None:
"""
Test that the logcontext is preserved when we timeout and the deferred is
cancelled.
"""
# Sanity check that we start in the sentinel context
self.assertEqual(current_context(), SENTINEL_CONTEXT)

Expand Down Expand Up @@ -270,6 +275,65 @@ def mark_was_cancelled(res: Failure) -> None:
# Back to the sentinel context
self.assertEqual(current_context(), SENTINEL_CONTEXT)

@logcontext_clean
async def test_logcontext_is_not_lost_when_awaiting_on_timeout_cancellation(
self,
) -> None:
Comment thread
anoadragon453 marked this conversation as resolved.
"""
Test that the logcontext isn't lost when we `await make_deferred_yieldable(...)`
the deferred to complete/timeout and it times out.
"""

# Sanity check that we start in the sentinel context
self.assertEqual(current_context(), SENTINEL_CONTEXT)

# Create a deferred which we will never complete
incomplete_d: Deferred = Deferred()

async def competing_task() -> None:
with LoggingContext(
name="competing", server_name="test_server"
) as context_competing:
timing_out_d = timeout_deferred(
deferred=incomplete_d,
timeout=1.0,
clock=self.clock,
)
self.assertNoResult(timing_out_d)
# We should still be in the logcontext we started in
self.assertIs(current_context(), context_competing)

# Mimic the normal use case to wait for the work to complete or timeout.
#
# In this specific test, we expect the deferred to timeout and raise an
# exception at this point.
await make_deferred_yieldable(timing_out_d)

self.fail(
"We should not make it to this point as the `timing_out_d` should have been cancelled"
)

d = defer.ensureDeferred(competing_task())

# Still in the sentinel context
self.assertEqual(current_context(), SENTINEL_CONTEXT)

# Pump until we trigger the timeout
self.reactor.pump(
# We only need to pump `1.0` (seconds) as we set
# `timeout_deferred(timeout=1.0)` above
(1.0,)
)

# Still in the sentinel context
self.assertEqual(current_context(), SENTINEL_CONTEXT)

# We expect a failure due to the timeout
self.failureResultOf(d, defer.TimeoutError)

# Back to the sentinel context at the end of the day
self.assertEqual(current_context(), SENTINEL_CONTEXT)


class _TestException(Exception): #
pass
Expand Down
Loading