fix: use socket .get() instead of .getsockopt() to prevent deaf loop#8
Merged
Conversation
This fixes a bug where getsockopt (which pyzmq aliases to SocketBase.get at class level) bypassed the async override in zmq.asyncio.Socket. That bypass drained the FD edge-trigger without rescheduling the asyncio loop, causing the shell channel reader to stall on subsequent replies.
Contributor
|
Nice! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Apparently the FD (the socket file descriptor that asyncio
selects on) does not hold the data at all. The ZMQ messages are stored in a memory queue via an IO thread, and the consumer is supposed to check the Events bitmap to see if it has messages in the memory queue or not. The FD works like anasyncio.Event- it gets set only when the Events bitmap is 0 and there is new message (i.e., the IO thread thinks something is waiting on the FD). Any ZMQ operation (including send) first reads the FD to set the Events bitmap. But asyncio'sselecton the FD ignores the bitmap. PyZMQ has machinery to handle the Events bitmap, but it is only done when needed. If something (like a sync shadow socket) sets the Events map (draining the FD), the asyncio socket will never handle that (and read will sleep forever).Our
stream.getsockopt(zmq.EVENTS)was supposed to schedule the handler back onasyncio. The issue, though, is thatzmq.asyncio.Sockethas a bug, andgetsockoptis sync, and is not overwritten in async version, so it does not add the handler.The reason for that is this:
and
zmq.asyncio.Socketonly overwritesget, leavinggetsockoptnot overwritten.So the fix is what opus suggested running
stream.get(zmq.EVENTS)instead ofgetsockoptThe issue was so hard to tackle because when this code is running in side of another Jupyter kernel it uses
io_threadbranch and correctly usesgetinstead ofgetsockopt.