Client
PubSub
Tested/observed with v1.43.0 default receive settings, and v2.3.0 custom receive settings.
Expected behavior
Expect control messages (e.g. Ack, ModAck) to be sent by the client unobstructed by whatever work happens on the data fetching and processing side.
Actual behavior
We have been having similar problems as described in the issues linked below: within the context of a service fetching messages from a number of different subscriptions with various levels of backlog, and processing the events at various rates (some of the streams being effectively stalled), we see delays in fetching pending messages and elevated numbers of duplicate sends due to the client attempts at sending back acks timing out.
We first observed this with the v1 client that has a default receiver settings of 10 goroutines. Switching to the v2 client resolves or mitigates the problem, but only if we don't touch the new default of 1 goroutine.
Investigation
The likely explanation is that when the client tries to send back acks, it fails to grab hold of a connection from the pool if iterator workers are hogging them on slow/blocking pull stream requests. Increasing the connection pool size also resolves the issue.
To put bolts and braces on this, ideally the client would be using separate connection pools for data fetching and control messages. Otherwise might be at least worth better documenting this behaviour and/or making the config check more strict to try and avoid such situations.
Additional context
Likely related in various ways to #1247, #1485, #9727, #10437, #1584.
Client
PubSub
Tested/observed with
v1.43.0default receive settings, andv2.3.0custom receive settings.Expected behavior
Expect control messages (e.g. Ack, ModAck) to be sent by the client unobstructed by whatever work happens on the data fetching and processing side.
Actual behavior
We have been having similar problems as described in the issues linked below: within the context of a service fetching messages from a number of different subscriptions with various levels of backlog, and processing the events at various rates (some of the streams being effectively stalled), we see delays in fetching pending messages and elevated numbers of duplicate sends due to the client attempts at sending back acks timing out.
We first observed this with the v1 client that has a default receiver settings of
10goroutines. Switching to the v2 client resolves or mitigates the problem, but only if we don't touch the new default of1goroutine.Investigation
The likely explanation is that when the client tries to send back acks, it fails to grab hold of a connection from the pool if iterator workers are hogging them on slow/blocking pull stream requests. Increasing the connection pool size also resolves the issue.
To put bolts and braces on this, ideally the client would be using separate connection pools for data fetching and control messages. Otherwise might be at least worth better documenting this behaviour and/or making the config check more strict to try and avoid such situations.
Additional context
Likely related in various ways to #1247, #1485, #9727, #10437, #1584.