Synapse takes forever to send large responses. It takes us longer to send just the response than it does for us to process and encode_json_response in some cases.
Examples
98s to process the request and encode_json_response but the request isn't finished sending until 484s (8 minutes) which is 6.5 minutes of dead-time. The response size is 36MB
Jaeger trace: 4238bdbadd9f3077.json
59s to process and finishes after 199s. The response size is 36MB
Jaeger trace: 2149cc5e59306446.json
I've come across this before and it's not a new thing. For example in #13620 (original issue), I described as the "mystery gap at the end after we encode the JSON response (encode_json_response)" but never encountered it being this egregious.
It can also happen for small requests. 2s to process and finishes after 5s. The response size is 120KB
Investigation
@kegsay pointed out _write_bytes_to_request which runs after encode_json_response and has comments like "Write until there's backpressure telling us to stop." that definitely hint at some areas of interest.
|
with start_active_span("encode_json_response"): |
|
span = active_span() |
|
json_str = await defer_to_thread(request.reactor, encode, span) |
|
|
|
_write_bytes_to_request(request, json_str) |
The JSON serialization is done in a background thread because it can block the reactor for many seconds. This part seems normal and fast (no problem).
But we also use _ByteProducer to send the bytes down to the client. Using a producer ensures we can send down all of the bytes to the client without hitting a 60s timeout (see context in comments below)
|
# The problem with dumping all of the response into the `Request` object at |
|
# once (via `Request.write`) is that doing so starts the timeout for the |
|
# next request to be received: so if it takes longer than 60s to stream back |
|
# the response to the client, the client never gets it. |
|
# |
|
# The correct solution is to use a Producer; then the timeout is only |
|
# started once all of the content is sent over the TCP connection. |
This logic was added in:
Some extra time is expected as we're working with the reactor instead of blocking it but it seems like something isn't tuned optimally (chunk size, starting/stopping too much, etc)
Synapse takes forever to send large responses. It takes us longer to send just the response than it does for us to process and
encode_json_responsein some cases.Examples
98s to process the request and
encode_json_responsebut the request isn't finished sending until 484s (8 minutes) which is 6.5 minutes of dead-time. The response size is 36MBJaeger trace:
4238bdbadd9f3077.json59s to process and finishes after 199s. The response size is 36MB
Jaeger trace:
2149cc5e59306446.jsonI've come across this before and it's not a new thing. For example in #13620 (original issue), I described as the "mystery gap at the end after we encode the JSON response (
encode_json_response)" but never encountered it being this egregious.It can also happen for small requests. 2s to process and finishes after 5s. The response size is 120KB
Investigation
@kegsay pointed out
_write_bytes_to_requestwhich runs afterencode_json_responseand has comments like "Write until there's backpressure telling us to stop." that definitely hint at some areas of interest.synapse/synapse/http/server.py
Lines 869 to 873 in 03937a1
The JSON serialization is done in a background thread because it can block the reactor for many seconds. This part seems normal and fast (no problem).
But we also use
_ByteProducerto send the bytes down to the client. Using a producer ensures we can send down all of the bytes to the client without hitting a 60s timeout (see context in comments below)synapse/synapse/http/server.py
Lines 883 to 889 in d40bc27
This logic was added in:
Some extra time is expected as we're working with the reactor instead of blocking it but it seems like something isn't tuned optimally (chunk size, starting/stopping too much, etc)