Skip to content

fix: use socket .get() instead of .getsockopt() to prevent deaf loop#8

Merged
jph00 merged 1 commit into
mainfrom
fix/4min-timeout-bug
Jun 10, 2026
Merged

fix: use socket .get() instead of .getsockopt() to prevent deaf loop#8
jph00 merged 1 commit into
mainfrom
fix/4min-timeout-bug

Conversation

@PiotrCzapla

Copy link
Copy Markdown
Contributor

Apparently the FD (the socket file descriptor that asyncio selects on) does not hold the data at all. The ZMQ messages are stored in a memory queue via an IO thread, and the consumer is supposed to check the Events bitmap to see if it has messages in the memory queue or not. The FD works like an asyncio.Event - it gets set only when the Events bitmap is 0 and there is new message (i.e., the IO thread thinks something is waiting on the FD). Any ZMQ operation (including send) first reads the FD to set the Events bitmap. But asyncio's select on the FD ignores the bitmap. PyZMQ has machinery to handle the Events bitmap, but it is only done when needed. If something (like a sync shadow socket) sets the Events map (draining the FD), the asyncio socket will never handle that (and read will sleep forever).

Our stream.getsockopt(zmq.EVENTS) was supposed to schedule the handler back on asyncio. The issue, though, is that zmq.asyncio.Socket has a bug, and getsockopt is sync, and is not overwritten in async version, so it does not add the handler.

The reason for that is this:

class SocketBase:
    getsockopt = SocketBase.get

and zmq.asyncio.Socket only overwrites get, leaving getsockopt not overwritten.

So the fix is what opus suggested running stream.get(zmq.EVENTS) instead of getsockopt

The issue was so hard to tackle because when this code is running in side of another Jupyter kernel it uses io_thread branch and correctly uses get instead of getsockopt.

 This fixes a bug where getsockopt (which pyzmq aliases to SocketBase.get at class level) bypassed the async override in zmq.asyncio.Socket. That bypass drained the FD edge-trigger without rescheduling the asyncio loop, causing the shell channel reader to stall on subsequent replies.
@jph00

jph00 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Nice!

@jph00 jph00 merged commit a9af9f1 into main Jun 10, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants