Explain how Deferred callbacks interact with logcontexts (#18914)

MadLittleMods · web-flow · commit 5266e423e20f · 2025-09-24T16:20:42.000-05:00
Spawning from matrix-org/synapse#12588 (comment) > It turns out `Deferred.cancel()` is a lot like `Deferred.callback()`/`errback()` in that it will trash the logging context: > it can resume a coroutine, which will restore its own logging context, then run: > > - until it blocks, setting the sentinel context > - or until it terminates, setting the context it was started with > > So we need to wrap it in `with PreserveLoggingContext():`, like we do with `.callback()`: > > ```python > with PreserveLoggingContext(): > self.render_deferred.cancel() > ``` > > *-- @squahtx, matrix-org/synapse#12588 (comment)
diff --git a/changelog.d/18914.doc b/changelog.d/18914.doc
@@ -0,0 +1 @@
+Explain how Deferred callbacks interact with logcontexts.
diff --git a/docs/log_contexts.md b/docs/log_contexts.md
@@ -143,8 +143,7 @@ cares about.
 The following sections describe pitfalls and helpful patterns when
 implementing these rules.
 
-Always await your awaitables
-----------------------------
+## Always await your awaitables
 
 Whenever you get an awaitable back from a function, you should `await` on
 it as soon as possible. Do not pass go; do not do any logging; do not
@@ -203,6 +202,171 @@ async def sleep(seconds):
     return await context.make_deferred_yieldable(get_sleep_deferred(seconds))
 ```
 
+## Deferred callbacks
+
+When a deferred callback is called, it inherits the current logcontext. The deferred
+callback chain can resume a coroutine, which if following our logcontext rules, will
+restore its own logcontext, then run:
+
+ - until it yields control back to the reactor, setting the sentinel logcontext
+ - or until it finishes, restoring the logcontext it was started with (calling context)
+
+This behavior creates two specific issues:
+
+**Issue 1:** The first issue is that the callback may have reset the logcontext to the
+sentinel before returning. This means our calling function will continue with the
+sentinel logcontext instead of the logcontext it was started with (bad).
+
+**Issue 2:** The second issue is that the current logcontext that called the deferred
+callback could finish before the callback finishes (bad).
+
+In the following example, the deferred callback is called with the "main" logcontext and
+runs until we yield control back to the reactor in the `await` inside `clock.sleep(0)`.
+Since `clock.sleep(0)` follows our logcontext rules, it sets the logcontext to the
+sentinel before yielding control back to the reactor. Our `main` function continues with
+the sentinel logcontext (first bad thing) instead of the "main" logcontext. Then the
+`with LoggingContext("main")` block exits, finishing the "main" logcontext and yielding
+control back to the reactor again. Finally, later on when `clock.sleep(0)` completes,
+our `with LoggingContext("competing")` block exits, and restores the previous "main"
+logcontext which has already finished, resulting in `WARNING: Re-starting finished log
+context main` and leaking the `main` logcontext into the reactor which will then
+erronously be associated with the next task the reactor picks up.
+
+```python
+async def competing_callback():
+    # Since this is run with the "main" logcontext, when the "competing"
+    # logcontext exits, it will restore the previous "main" logcontext which has
+    # already finished and results in "WARNING: Re-starting finished log context main"
+    # and leaking the `main` logcontext into the reactor.
+    with LoggingContext("competing"):
+        await clock.sleep(0)
+
+def main():
+    with LoggingContext("main"):
+        d = defer.Deferred()
+        d.addCallback(lambda _: defer.ensureDeferred(competing_callback()))
+        # Call the callback within the "main" logcontext.
+        d.callback(None)
+        # Bad: This will be logged against sentinel logcontext
+        logger.debug("ugh")
+
+main()
+```
+
+**Solution 1:** We could of course fix this by following the general rule of "always
+await your awaitables":
+
+```python
+async def main():
+    with LoggingContext("main"):
+        d = defer.Deferred()
+        d.addCallback(lambda _: defer.ensureDeferred(competing_callback()))
+        d.callback(None)
+        # Wait for `d` to finish before continuing so the "main" logcontext is
+        # still active. This works because `d` already follows our logcontext
+        # rules. If not, we would also have to use `make_deferred_yieldable(d)`.
+        await d
+        # Good: This will be logged against the "main" logcontext
+        logger.debug("phew")
+``` 
+
+**Solution 2:** We could also fix this by surrounding the call to `d.callback` with a
+`PreserveLoggingContext`, which will reset the logcontext to the sentinel before calling
+the callback, and restore the "main" logcontext afterwards before continuing the `main`
+function. This solves the problem because when the "competing" logcontext exits, it will
+restore the sentinel logcontext which is never finished by its nature, so there is no
+warning and no leakage into the reactor.
+
+```python
+async def main():
+    with LoggingContext("main"):
+        d = defer.Deferred()
+        d.addCallback(lambda _: defer.ensureDeferred(competing_callback()))
+        d.callback(None)
+        with PreserveLoggingContext():
+            # Call the callback with the sentinel logcontext.
+            d.callback(None)
+        # Good: This will be logged against the "main" logcontext
+        logger.debug("phew")
+```
+
+**Solution 3:** But let's say you *do* want to run (fire-and-forget) the deferred
+callback in the current context without running into issues:
+
+We can solve the first issue by using `run_in_background(...)` to run the callback in
+the current logcontext and it handles the magic behind the scenes of a) restoring the
+calling logcontext before returning to the caller and b) resetting the logcontext to the
+sentinel after the deferred completes and we yield control back to the reactor to avoid
+leaking the logcontext into the reactor.
+
+To solve the second issue, we can extend the lifetime of the "main" logcontext by
+avoiding the `LoggingContext`'s context manager lifetime methods
+(`__enter__`/`__exit__`). We can still set "main" as the current logcontext by using
+`PreserveLoggingContext` and passing in the "main" logcontext.
+
+
+```python
+async def main():
+    main_context = LoggingContext("main")
+    with PreserveLoggingContext(main_context):
+        d = defer.Deferred()
+        d.addCallback(lambda _: defer.ensureDeferred(competing_callback()))
+        # The whole lambda will be run in the "main" logcontext. But we're using
+        # a trick to return the deferred `d` itself so that `run_in_background`
+        # will wait on that to complete and reset the logcontext to the sentinel
+        # when it does to avoid leaking the "main" logcontext into the reactor.
+        run_in_background(lambda: (d.callback(None), d)[1])
+        # Good: This will be logged against the "main" logcontext
+        logger.debug("phew")
+
+...
+
+# Wherever possible, it's best to finish the logcontext by calling `__exit__` at some
+# point. This allows us to catch bugs if we later try to erroneously restart a finished
+# logcontext.
+#
+# Since the "main" logcontext stores the `LoggingContext.previous_context` when it is
+# created, we can wrap this call in `PreserveLoggingContext()` to restore the correct
+# previous logcontext. Our goal is to have the calling context remain unchanged after
+# finishing the "main" logcontext.
+with PreserveLoggingContext():
+    # Finish the "main" logcontext
+    with main_context:
+        # Empty block - We're just trying to call `__exit__` on the "main" context
+        # manager to finish it. We can't call `__exit__` directly as the code expects us
+        # to `__enter__` before calling `__exit__` to `start`/`stop` things
+        # appropriately. And in any case, it's probably best not to call the internal
+        # methods directly.
+        pass
+```
+
+The same thing applies if you have some deferreds stored somewhere which you want to
+callback in the current logcontext.
+
+
+### Deferred errbacks and cancellations
+
+The same care should be taken when calling errbacks on deferreds. An errback and
+callback act the same in this regard (see section above).
+
+```python
+d = defer.Deferred()
+d.addErrback(some_other_function)
+d.errback(failure)
+```
+
+Additionally, cancellation is the same as directly calling the errback with a
+`twisted.internet.defer.CancelledError`:
+
+```python
+d = defer.Deferred()
+d.addErrback(some_other_function)
+d.cancel()
+```
+
+
+
+
 ## Fire-and-forget
 
 Sometimes you want to fire off a chain of execution, but not wait for
diff --git a/tests/util/test_logcontext.py b/tests/util/test_logcontext.py
@@ -249,6 +249,204 @@ async def competing_callback() -> None:
         # Back to the sentinel context
         self._check_test_key("sentinel")
 
+    @logcontext_clean
+    async def test_deferred_callback_await_in_current_logcontext(self) -> None:
+        """
+        Test that calling the deferred callback in the current logcontext ("foo") and
+        waiting for it to finish in a logcontext blocks works as expected.
+
+        Works because "always await your awaitables".
+
+        Demonstrates one pattern that we can use fix the naive case where we just call
+        `d.callback(None)` without anything else. See the *Deferred callbacks* section
+        of docs/log_contexts.md for more details.
+        """
+        clock = Clock(reactor)
+
+        # Sanity check that we start in the sentinel context
+        self._check_test_key("sentinel")
+
+        callback_finished = False
+
+        async def competing_callback() -> None:
+            nonlocal callback_finished
+            try:
+                # The deferred callback should have the same logcontext as the caller
+                self._check_test_key("foo")
+
+                with LoggingContext("competing"):
+                    await clock.sleep(0)
+                    self._check_test_key("competing")
+
+                self._check_test_key("foo")
+            finally:
+                # When exceptions happen, we still want to mark the callback as finished
+                # so that the test can complete and we see the underlying error.
+                callback_finished = True
+
+        with LoggingContext("foo"):
+            d: defer.Deferred[None] = defer.Deferred()
+            d.addCallback(lambda _: defer.ensureDeferred(competing_callback()))
+            self._check_test_key("foo")
+            d.callback(None)
+            # The fix for the naive case is here (i.e. things don't work correctly if we
+            # don't await here).
+            #
+            # Wait for `d` to finish before continuing so the "main" logcontext is
+            # still active. This works because `d` already follows our logcontext
+            # rules. If not, we would also have to use `make_deferred_yieldable(d)`.
+            await d
+            self._check_test_key("foo")
+
+        await clock.sleep(0)
+
+        self.assertTrue(
+            callback_finished,
+            "Callback never finished which means the test probably didn't wait long enough",
+        )
+
+        # Back to the sentinel context
+        self._check_test_key("sentinel")
+
+    @logcontext_clean
+    async def test_deferred_callback_preserve_logging_context(self) -> None:
+        """
+        Test that calling the deferred callback inside `PreserveLoggingContext()` (in
+        the sentinel context) works as expected.
+
+        Demonstrates one pattern that we can use fix the naive case where we just call
+        `d.callback(None)` without anything else. See the *Deferred callbacks* section
+        of docs/log_contexts.md for more details.
+        """
+        clock = Clock(reactor)
+
+        # Sanity check that we start in the sentinel context
+        self._check_test_key("sentinel")
+
+        callback_finished = False
+
+        async def competing_callback() -> None:
+            nonlocal callback_finished
+            try:
+                # The deferred callback should have the same logcontext as the caller
+                self._check_test_key("sentinel")
+
+                with LoggingContext("competing"):
+                    await clock.sleep(0)
+                    self._check_test_key("competing")
+
+                self._check_test_key("sentinel")
+            finally:
+                # When exceptions happen, we still want to mark the callback as finished
+                # so that the test can complete and we see the underlying error.
+                callback_finished = True
+
+        with LoggingContext("foo"):
+            d: defer.Deferred[None] = defer.Deferred()
+            d.addCallback(lambda _: defer.ensureDeferred(competing_callback()))
+            self._check_test_key("foo")
+            # The fix for the naive case is here (i.e. things don't work correctly if we
+            # don't `PreserveLoggingContext()` here).
+            #
+            # `PreserveLoggingContext` will reset the logcontext to the sentinel before
+            # calling the callback, and restore the "foo" logcontext afterwards before
+            # continuing the foo block. This solves the problem because when the
+            # "competing" logcontext exits, it will restore the sentinel logcontext
+            # which is never finished by its nature, so there is no warning and no
+            # leakage into the reactor.
+            with PreserveLoggingContext():
+                d.callback(None)
+            self._check_test_key("foo")
+
+        await clock.sleep(0)
+
+        self.assertTrue(
+            callback_finished,
+            "Callback never finished which means the test probably didn't wait long enough",
+        )
+
+        # Back to the sentinel context
+        self._check_test_key("sentinel")
+
+    @logcontext_clean
+    async def test_deferred_callback_fire_and_forget_with_current_context(self) -> None:
+        """
+        Test that it's possible to call the deferred callback with the current context
+        while fire-and-forgetting the callback (no adverse effects like leaking the
+        logcontext into the reactor or restarting an already finished logcontext).
+
+        Demonstrates one pattern that we can use fix the naive case where we just call
+        `d.callback(None)` without anything else. See the *Deferred callbacks* section
+        of docs/log_contexts.md for more details.
+        """
+        clock = Clock(reactor)
+
+        # Sanity check that we start in the sentinel context
+        self._check_test_key("sentinel")
+
+        callback_finished = False
+
+        async def competing_callback() -> None:
+            nonlocal callback_finished
+            try:
+                # The deferred callback should have the same logcontext as the caller
+                self._check_test_key("foo")
+
+                with LoggingContext("competing"):
+                    await clock.sleep(0)
+                    self._check_test_key("competing")
+
+                self._check_test_key("foo")
+            finally:
+                # When exceptions happen, we still want to mark the callback as finished
+                # so that the test can complete and we see the underlying error.
+                callback_finished = True
+
+        # Part of fix for the naive case is here (i.e. things don't work correctly if we
+        # don't `PreserveLoggingContext(...)` here).
+        #
+        # We can extend the lifetime of the "foo" logcontext is to avoid calling the
+        # context manager lifetime methods of `LoggingContext` (`__enter__`/`__exit__`).
+        # And we can still set the current logcontext by using `PreserveLoggingContext`
+        # and passing in the "foo" logcontext.
+        with PreserveLoggingContext(LoggingContext("foo")):
+            d: defer.Deferred[None] = defer.Deferred()
+            d.addCallback(lambda _: defer.ensureDeferred(competing_callback()))
+            self._check_test_key("foo")
+            # Other part of fix for the naive case is here (i.e. things don't work
+            # correctly if we don't `run_in_background(...)` here).
+            #
+            # `run_in_background(...)` will run the whole lambda in the current
+            # logcontext and it handles the magic behind the scenes of a) restoring the
+            # calling logcontext before returning to the caller and b) resetting the
+            # logcontext to the sentinel after the deferred completes and we yield
+            # control back to the reactor to avoid leaking the logcontext into the
+            # reactor.
+            #
+            # We're using a lambda here as a little trick so we can still get everything
+            # to run in the "foo" logcontext, but return the deferred `d` itself so that
+            # `run_in_background` will wait on that to complete before resetting the
+            # logcontext to the sentinel.
+            #
+            # type-ignore[call-overload]: This appears like a mypy type inference bug. A
+            # function that returns a deferred is exactly what `run_in_background`
+            # expects.
+            #
+            # type-ignore[func-returns-value]: This appears like a mypy type inference
+            # bug. We're always returning the deferred `d`.
+            run_in_background(lambda: (d.callback(None), d)[1])  # type: ignore[call-overload, func-returns-value]
+            self._check_test_key("foo")
+
+        await clock.sleep(0)
+
+        self.assertTrue(
+            callback_finished,
+            "Callback never finished which means the test probably didn't wait long enough",
+        )
+
+        # Back to the sentinel context
+        self._check_test_key("sentinel")
+
     async def _test_run_in_background(self, function: Callable[[], object]) -> None:
         clock = Clock(reactor)
 

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+Explain how Deferred callbacks interact with logcontexts.`