Skip to content

Make pgconn_connect_poll close the socket prior to calling PQconnectPoll#564

Merged
larskanis merged 1 commit intoged:masterfrom
jcalvert:fix_connect_poll_fd_reuse
Apr 9, 2024
Merged

Make pgconn_connect_poll close the socket prior to calling PQconnectPoll#564
larskanis merged 1 commit intoged:masterfrom
jcalvert:fix_connect_poll_fd_reuse

Conversation

@jcalvert
Copy link
Copy Markdown
Contributor

@jcalvert jcalvert commented Apr 8, 2024

Make pgconn_connect_poll close the socket prior to calling PQconnectPoll

Since PQconnectPoll can change the underlying socket and thus the file descriptor, closing the socket after can manifest a race condition where libpq has closed the socket and another thread has opened a socket with the recycled file descriptor. When the original socket is closed from Ruby, the VM will notify the new thread that the FD has closed (erroneously) resulting in an exception.

This is an exception I encountered in a production environment (C Ruby 3.3), somewhat sporadically It is quite misleading as the way this manifests is that the exception raises while waiting on the socket, so it seems as if connections are getting shared between threads.

The reason this happens is because when the IO object is closed on the Ruby side, the VM calls rb_notify_fd_close which in turn sends an exception to any threads waiting on the file descriptor. On the Ruby 2.x path does not have this problem as IO.select will return the exception array, although I believe the VM still calls close()

`PQconnectPoll`

Since `PQconnectPoll` can change the underlying socket and thus the file
descriptor, closing the socket after can manifest a race condition
where libpq has closed the socket and another thread has opened a socket
with the recycled file descriptor. When the original socket is closed
from Ruby, the VM will notify the new thread that the FD has closed
(erroneously) resulting in an exception.

The same for `PQresetPoll`.
@larskanis larskanis force-pushed the fix_connect_poll_fd_reuse branch from 0214687 to 21a064b Compare April 9, 2024 13:30
@larskanis larskanis merged commit 8eec744 into ged:master Apr 9, 2024
@larskanis
Copy link
Copy Markdown
Collaborator

Thank you for doing the analysis and the fix!

I changed PQresetPoll equally and moved the test to its own it example. The test still shows the issue, without running the code of the previous test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants