Skip to content

Commit 5266e42

Browse files
Explain how Deferred callbacks interact with logcontexts (#18914)
Spawning from matrix-org/synapse#12588 (comment) > It turns out `Deferred.cancel()` is a lot like `Deferred.callback()`/`errback()` in that it will trash the logging context: > it can resume a coroutine, which will restore its own logging context, then run: > > - until it blocks, setting the sentinel context > - or until it terminates, setting the context it was started with > > So we need to wrap it in `with PreserveLoggingContext():`, like we do with `.callback()`: > > ```python > with PreserveLoggingContext(): > self.render_deferred.cancel() > ``` > > *-- @squahtx, matrix-org/synapse#12588 (comment)
1 parent 0458f69 commit 5266e42

3 files changed

Lines changed: 365 additions & 2 deletions

File tree

changelog.d/18914.doc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Explain how Deferred callbacks interact with logcontexts.

docs/log_contexts.md

Lines changed: 166 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -143,8 +143,7 @@ cares about.
143143
The following sections describe pitfalls and helpful patterns when
144144
implementing these rules.
145145

146-
Always await your awaitables
147-
----------------------------
146+
## Always await your awaitables
148147

149148
Whenever you get an awaitable back from a function, you should `await` on
150149
it as soon as possible. Do not pass go; do not do any logging; do not
@@ -203,6 +202,171 @@ async def sleep(seconds):
203202
return await context.make_deferred_yieldable(get_sleep_deferred(seconds))
204203
```
205204

205+
## Deferred callbacks
206+
207+
When a deferred callback is called, it inherits the current logcontext. The deferred
208+
callback chain can resume a coroutine, which if following our logcontext rules, will
209+
restore its own logcontext, then run:
210+
211+
- until it yields control back to the reactor, setting the sentinel logcontext
212+
- or until it finishes, restoring the logcontext it was started with (calling context)
213+
214+
This behavior creates two specific issues:
215+
216+
**Issue 1:** The first issue is that the callback may have reset the logcontext to the
217+
sentinel before returning. This means our calling function will continue with the
218+
sentinel logcontext instead of the logcontext it was started with (bad).
219+
220+
**Issue 2:** The second issue is that the current logcontext that called the deferred
221+
callback could finish before the callback finishes (bad).
222+
223+
In the following example, the deferred callback is called with the "main" logcontext and
224+
runs until we yield control back to the reactor in the `await` inside `clock.sleep(0)`.
225+
Since `clock.sleep(0)` follows our logcontext rules, it sets the logcontext to the
226+
sentinel before yielding control back to the reactor. Our `main` function continues with
227+
the sentinel logcontext (first bad thing) instead of the "main" logcontext. Then the
228+
`with LoggingContext("main")` block exits, finishing the "main" logcontext and yielding
229+
control back to the reactor again. Finally, later on when `clock.sleep(0)` completes,
230+
our `with LoggingContext("competing")` block exits, and restores the previous "main"
231+
logcontext which has already finished, resulting in `WARNING: Re-starting finished log
232+
context main` and leaking the `main` logcontext into the reactor which will then
233+
erronously be associated with the next task the reactor picks up.
234+
235+
```python
236+
async def competing_callback():
237+
# Since this is run with the "main" logcontext, when the "competing"
238+
# logcontext exits, it will restore the previous "main" logcontext which has
239+
# already finished and results in "WARNING: Re-starting finished log context main"
240+
# and leaking the `main` logcontext into the reactor.
241+
with LoggingContext("competing"):
242+
await clock.sleep(0)
243+
244+
def main():
245+
with LoggingContext("main"):
246+
d = defer.Deferred()
247+
d.addCallback(lambda _: defer.ensureDeferred(competing_callback()))
248+
# Call the callback within the "main" logcontext.
249+
d.callback(None)
250+
# Bad: This will be logged against sentinel logcontext
251+
logger.debug("ugh")
252+
253+
main()
254+
```
255+
256+
**Solution 1:** We could of course fix this by following the general rule of "always
257+
await your awaitables":
258+
259+
```python
260+
async def main():
261+
with LoggingContext("main"):
262+
d = defer.Deferred()
263+
d.addCallback(lambda _: defer.ensureDeferred(competing_callback()))
264+
d.callback(None)
265+
# Wait for `d` to finish before continuing so the "main" logcontext is
266+
# still active. This works because `d` already follows our logcontext
267+
# rules. If not, we would also have to use `make_deferred_yieldable(d)`.
268+
await d
269+
# Good: This will be logged against the "main" logcontext
270+
logger.debug("phew")
271+
```
272+
273+
**Solution 2:** We could also fix this by surrounding the call to `d.callback` with a
274+
`PreserveLoggingContext`, which will reset the logcontext to the sentinel before calling
275+
the callback, and restore the "main" logcontext afterwards before continuing the `main`
276+
function. This solves the problem because when the "competing" logcontext exits, it will
277+
restore the sentinel logcontext which is never finished by its nature, so there is no
278+
warning and no leakage into the reactor.
279+
280+
```python
281+
async def main():
282+
with LoggingContext("main"):
283+
d = defer.Deferred()
284+
d.addCallback(lambda _: defer.ensureDeferred(competing_callback()))
285+
d.callback(None)
286+
with PreserveLoggingContext():
287+
# Call the callback with the sentinel logcontext.
288+
d.callback(None)
289+
# Good: This will be logged against the "main" logcontext
290+
logger.debug("phew")
291+
```
292+
293+
**Solution 3:** But let's say you *do* want to run (fire-and-forget) the deferred
294+
callback in the current context without running into issues:
295+
296+
We can solve the first issue by using `run_in_background(...)` to run the callback in
297+
the current logcontext and it handles the magic behind the scenes of a) restoring the
298+
calling logcontext before returning to the caller and b) resetting the logcontext to the
299+
sentinel after the deferred completes and we yield control back to the reactor to avoid
300+
leaking the logcontext into the reactor.
301+
302+
To solve the second issue, we can extend the lifetime of the "main" logcontext by
303+
avoiding the `LoggingContext`'s context manager lifetime methods
304+
(`__enter__`/`__exit__`). We can still set "main" as the current logcontext by using
305+
`PreserveLoggingContext` and passing in the "main" logcontext.
306+
307+
308+
```python
309+
async def main():
310+
main_context = LoggingContext("main")
311+
with PreserveLoggingContext(main_context):
312+
d = defer.Deferred()
313+
d.addCallback(lambda _: defer.ensureDeferred(competing_callback()))
314+
# The whole lambda will be run in the "main" logcontext. But we're using
315+
# a trick to return the deferred `d` itself so that `run_in_background`
316+
# will wait on that to complete and reset the logcontext to the sentinel
317+
# when it does to avoid leaking the "main" logcontext into the reactor.
318+
run_in_background(lambda: (d.callback(None), d)[1])
319+
# Good: This will be logged against the "main" logcontext
320+
logger.debug("phew")
321+
322+
...
323+
324+
# Wherever possible, it's best to finish the logcontext by calling `__exit__` at some
325+
# point. This allows us to catch bugs if we later try to erroneously restart a finished
326+
# logcontext.
327+
#
328+
# Since the "main" logcontext stores the `LoggingContext.previous_context` when it is
329+
# created, we can wrap this call in `PreserveLoggingContext()` to restore the correct
330+
# previous logcontext. Our goal is to have the calling context remain unchanged after
331+
# finishing the "main" logcontext.
332+
with PreserveLoggingContext():
333+
# Finish the "main" logcontext
334+
with main_context:
335+
# Empty block - We're just trying to call `__exit__` on the "main" context
336+
# manager to finish it. We can't call `__exit__` directly as the code expects us
337+
# to `__enter__` before calling `__exit__` to `start`/`stop` things
338+
# appropriately. And in any case, it's probably best not to call the internal
339+
# methods directly.
340+
pass
341+
```
342+
343+
The same thing applies if you have some deferreds stored somewhere which you want to
344+
callback in the current logcontext.
345+
346+
347+
### Deferred errbacks and cancellations
348+
349+
The same care should be taken when calling errbacks on deferreds. An errback and
350+
callback act the same in this regard (see section above).
351+
352+
```python
353+
d = defer.Deferred()
354+
d.addErrback(some_other_function)
355+
d.errback(failure)
356+
```
357+
358+
Additionally, cancellation is the same as directly calling the errback with a
359+
`twisted.internet.defer.CancelledError`:
360+
361+
```python
362+
d = defer.Deferred()
363+
d.addErrback(some_other_function)
364+
d.cancel()
365+
```
366+
367+
368+
369+
206370
## Fire-and-forget
207371

208372
Sometimes you want to fire off a chain of execution, but not wait for

tests/util/test_logcontext.py

Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -249,6 +249,204 @@ async def competing_callback() -> None:
249249
# Back to the sentinel context
250250
self._check_test_key("sentinel")
251251

252+
@logcontext_clean
253+
async def test_deferred_callback_await_in_current_logcontext(self) -> None:
254+
"""
255+
Test that calling the deferred callback in the current logcontext ("foo") and
256+
waiting for it to finish in a logcontext blocks works as expected.
257+
258+
Works because "always await your awaitables".
259+
260+
Demonstrates one pattern that we can use fix the naive case where we just call
261+
`d.callback(None)` without anything else. See the *Deferred callbacks* section
262+
of docs/log_contexts.md for more details.
263+
"""
264+
clock = Clock(reactor)
265+
266+
# Sanity check that we start in the sentinel context
267+
self._check_test_key("sentinel")
268+
269+
callback_finished = False
270+
271+
async def competing_callback() -> None:
272+
nonlocal callback_finished
273+
try:
274+
# The deferred callback should have the same logcontext as the caller
275+
self._check_test_key("foo")
276+
277+
with LoggingContext("competing"):
278+
await clock.sleep(0)
279+
self._check_test_key("competing")
280+
281+
self._check_test_key("foo")
282+
finally:
283+
# When exceptions happen, we still want to mark the callback as finished
284+
# so that the test can complete and we see the underlying error.
285+
callback_finished = True
286+
287+
with LoggingContext("foo"):
288+
d: defer.Deferred[None] = defer.Deferred()
289+
d.addCallback(lambda _: defer.ensureDeferred(competing_callback()))
290+
self._check_test_key("foo")
291+
d.callback(None)
292+
# The fix for the naive case is here (i.e. things don't work correctly if we
293+
# don't await here).
294+
#
295+
# Wait for `d` to finish before continuing so the "main" logcontext is
296+
# still active. This works because `d` already follows our logcontext
297+
# rules. If not, we would also have to use `make_deferred_yieldable(d)`.
298+
await d
299+
self._check_test_key("foo")
300+
301+
await clock.sleep(0)
302+
303+
self.assertTrue(
304+
callback_finished,
305+
"Callback never finished which means the test probably didn't wait long enough",
306+
)
307+
308+
# Back to the sentinel context
309+
self._check_test_key("sentinel")
310+
311+
@logcontext_clean
312+
async def test_deferred_callback_preserve_logging_context(self) -> None:
313+
"""
314+
Test that calling the deferred callback inside `PreserveLoggingContext()` (in
315+
the sentinel context) works as expected.
316+
317+
Demonstrates one pattern that we can use fix the naive case where we just call
318+
`d.callback(None)` without anything else. See the *Deferred callbacks* section
319+
of docs/log_contexts.md for more details.
320+
"""
321+
clock = Clock(reactor)
322+
323+
# Sanity check that we start in the sentinel context
324+
self._check_test_key("sentinel")
325+
326+
callback_finished = False
327+
328+
async def competing_callback() -> None:
329+
nonlocal callback_finished
330+
try:
331+
# The deferred callback should have the same logcontext as the caller
332+
self._check_test_key("sentinel")
333+
334+
with LoggingContext("competing"):
335+
await clock.sleep(0)
336+
self._check_test_key("competing")
337+
338+
self._check_test_key("sentinel")
339+
finally:
340+
# When exceptions happen, we still want to mark the callback as finished
341+
# so that the test can complete and we see the underlying error.
342+
callback_finished = True
343+
344+
with LoggingContext("foo"):
345+
d: defer.Deferred[None] = defer.Deferred()
346+
d.addCallback(lambda _: defer.ensureDeferred(competing_callback()))
347+
self._check_test_key("foo")
348+
# The fix for the naive case is here (i.e. things don't work correctly if we
349+
# don't `PreserveLoggingContext()` here).
350+
#
351+
# `PreserveLoggingContext` will reset the logcontext to the sentinel before
352+
# calling the callback, and restore the "foo" logcontext afterwards before
353+
# continuing the foo block. This solves the problem because when the
354+
# "competing" logcontext exits, it will restore the sentinel logcontext
355+
# which is never finished by its nature, so there is no warning and no
356+
# leakage into the reactor.
357+
with PreserveLoggingContext():
358+
d.callback(None)
359+
self._check_test_key("foo")
360+
361+
await clock.sleep(0)
362+
363+
self.assertTrue(
364+
callback_finished,
365+
"Callback never finished which means the test probably didn't wait long enough",
366+
)
367+
368+
# Back to the sentinel context
369+
self._check_test_key("sentinel")
370+
371+
@logcontext_clean
372+
async def test_deferred_callback_fire_and_forget_with_current_context(self) -> None:
373+
"""
374+
Test that it's possible to call the deferred callback with the current context
375+
while fire-and-forgetting the callback (no adverse effects like leaking the
376+
logcontext into the reactor or restarting an already finished logcontext).
377+
378+
Demonstrates one pattern that we can use fix the naive case where we just call
379+
`d.callback(None)` without anything else. See the *Deferred callbacks* section
380+
of docs/log_contexts.md for more details.
381+
"""
382+
clock = Clock(reactor)
383+
384+
# Sanity check that we start in the sentinel context
385+
self._check_test_key("sentinel")
386+
387+
callback_finished = False
388+
389+
async def competing_callback() -> None:
390+
nonlocal callback_finished
391+
try:
392+
# The deferred callback should have the same logcontext as the caller
393+
self._check_test_key("foo")
394+
395+
with LoggingContext("competing"):
396+
await clock.sleep(0)
397+
self._check_test_key("competing")
398+
399+
self._check_test_key("foo")
400+
finally:
401+
# When exceptions happen, we still want to mark the callback as finished
402+
# so that the test can complete and we see the underlying error.
403+
callback_finished = True
404+
405+
# Part of fix for the naive case is here (i.e. things don't work correctly if we
406+
# don't `PreserveLoggingContext(...)` here).
407+
#
408+
# We can extend the lifetime of the "foo" logcontext is to avoid calling the
409+
# context manager lifetime methods of `LoggingContext` (`__enter__`/`__exit__`).
410+
# And we can still set the current logcontext by using `PreserveLoggingContext`
411+
# and passing in the "foo" logcontext.
412+
with PreserveLoggingContext(LoggingContext("foo")):
413+
d: defer.Deferred[None] = defer.Deferred()
414+
d.addCallback(lambda _: defer.ensureDeferred(competing_callback()))
415+
self._check_test_key("foo")
416+
# Other part of fix for the naive case is here (i.e. things don't work
417+
# correctly if we don't `run_in_background(...)` here).
418+
#
419+
# `run_in_background(...)` will run the whole lambda in the current
420+
# logcontext and it handles the magic behind the scenes of a) restoring the
421+
# calling logcontext before returning to the caller and b) resetting the
422+
# logcontext to the sentinel after the deferred completes and we yield
423+
# control back to the reactor to avoid leaking the logcontext into the
424+
# reactor.
425+
#
426+
# We're using a lambda here as a little trick so we can still get everything
427+
# to run in the "foo" logcontext, but return the deferred `d` itself so that
428+
# `run_in_background` will wait on that to complete before resetting the
429+
# logcontext to the sentinel.
430+
#
431+
# type-ignore[call-overload]: This appears like a mypy type inference bug. A
432+
# function that returns a deferred is exactly what `run_in_background`
433+
# expects.
434+
#
435+
# type-ignore[func-returns-value]: This appears like a mypy type inference
436+
# bug. We're always returning the deferred `d`.
437+
run_in_background(lambda: (d.callback(None), d)[1]) # type: ignore[call-overload, func-returns-value]
438+
self._check_test_key("foo")
439+
440+
await clock.sleep(0)
441+
442+
self.assertTrue(
443+
callback_finished,
444+
"Callback never finished which means the test probably didn't wait long enough",
445+
)
446+
447+
# Back to the sentinel context
448+
self._check_test_key("sentinel")
449+
252450
async def _test_run_in_background(self, function: Callable[[], object]) -> None:
253451
clock = Clock(reactor)
254452

0 commit comments

Comments
 (0)