Synapse takes forever to send responses - dead time gap after `encode_json_response` 

Synapse takes forever to send large responses. It takes us longer to send just the response than it does for us to process and `encode_json_response` in some cases.


### Examples

98s to process the request and `encode_json_response` but the request isn't finished sending until 484s (8 minutes) which is 6.5 minutes of dead-time. The response size is 36MB

Jaeger trace: [`4238bdbadd9f3077.json`](https://gist.github.com/MadLittleMods/9b1613825848310a95326d532746a344)

<img width="846" alt="Jaeger trace with big gap for sending the response" src="https://github.com/user-attachments/assets/cdef59d8-773c-477f-8585-046ae94e970a">

59s to process and finishes after 199s. The response size is 36MB

Jaeger trace: [`2149cc5e59306446.json`](https://gist.github.com/MadLittleMods/d8dd06e7e0cbbee5f42d5478476cc4d3)

<img width="846" alt="Jaeger trace with big gap for sending the response" src="https://github.com/user-attachments/assets/107bd223-34e1-4f69-8277-3514b25c50af">


I've come across this before and it's not a new thing. For example in https://github.com/element-hq/synapse/issues/13620 ([original issue](https://github.com/matrix-org/synapse/issues/13620)), I described as the "mystery gap at the end after we encode the JSON response (`encode_json_response`)" but never encountered it being *this* egregious.

It can also happen for small requests. 2s to process and finishes after 5s. The response size is 120KB

<img width="829" alt="Jaeger trace with big gap for sending the response" src="https://github.com/user-attachments/assets/024f53d6-f5f7-4155-8761-ee2874961a69">



### Investigation

@kegsay pointed out `_write_bytes_to_request` which runs after `encode_json_response` and has comments like "Write until there's backpressure telling us to stop." that definitely hint at some areas of interest.

https://github.com/element-hq/synapse/blob/03937a1cae18900350a6d16a2714111a2847c821/synapse/http/server.py#L869-L873

The JSON serialization is done in a background thread because it can block the reactor for many seconds. This part seems normal and fast (no problem).

But we also use `_ByteProducer` to send the bytes down to the client. Using a producer ensures we can send down all of the bytes to the client without hitting a 60s timeout (see context in comments below)

https://github.com/element-hq/synapse/blob/d40bc279ed44ae4921d20d94e380c2d442cbfc44/synapse/http/server.py#L883-L889

This logic was added in:

 - https://github.com/matrix-org/synapse/pull/8013
 - https://github.com/matrix-org/synapse/pull/8116

Some extra time is expected as we're working *with* the reactor instead of blocking it but it seems like something isn't tuned optimally (chunk size, starting/stopping too much, etc)



	with start_active_span("encode_json_response"):
	span = active_span()
	json_str = await defer_to_thread(request.reactor, encode, span)

	_write_bytes_to_request(request, json_str)

	# The problem with dumping all of the response into the `Request` object at
	# once (via `Request.write`) is that doing so starts the timeout for the
	# next request to be received: so if it takes longer than 60s to stream back
	# the response to the client, the client never gets it.
	#
	# The correct solution is to use a Producer; then the timeout is only
	# started once all of the content is sent over the TCP connection.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synapse takes forever to send responses - dead time gap after `encode_json_response` #17722

Examples

Investigation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Synapse takes forever to send responses - dead time gap after encode_json_response #17722

Description

Examples

Investigation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Synapse takes forever to send responses - dead time gap after `encode_json_response` #17722