Skip to content

Ajmaidak try next host on timeout#637

Merged
larskanis merged 10 commits intoged:masterfrom
larskanis:ajmaidak-try-next-host-on-timeout
Jul 11, 2025
Merged

Ajmaidak try next host on timeout#637
larskanis merged 10 commits intoged:masterfrom
larskanis:ajmaidak-try-next-host-on-timeout

Conversation

@larskanis
Copy link
Copy Markdown
Collaborator

@larskanis larskanis commented Apr 8, 2025

Closes #618
Closes #619

@larskanis larskanis force-pushed the ajmaidak-try-next-host-on-timeout branch 5 times, most recently from f85f5c0 to bcd57a7 Compare May 3, 2025 13:42
@larskanis larskanis force-pushed the ajmaidak-try-next-host-on-timeout branch 3 times, most recently from dbf2f6a to fd306c0 Compare July 11, 2025 06:35
Alexander J. Maidak and others added 10 commits July 11, 2025 10:29
libpq does not support connect_timeout when using the async api.

So when using the async connection API  with a multi-host connection
string libpq will not timeout connections to the first host in the
list and thus will not attempt to connect to any subsequent hosts
in the connection string list.

This fixes this by closing and reopenning the connection with a
reordered connection string when connection times out.

See discussion on the pgsql-hackers list discussing this "feature" of the api:
https://www.postgresql.org/message-id/flat/CA%2Bmi_8YyGKA9dWELu63e%3DKL2oN-%2BFe4uca4EtFfb6uQD4Up8pw%40mail.gmail.com
in case of several hosts and async API.

This is what libpq does in sync API as well.
When rotating hosts, they are tried twice, if combined with hosts which fail with some other error.
When other errors occure, then add timeout errors to the libpq message list.

Also finish the connection object before raise to avoid keeping fds open.

Use a higher timeout on Windows, since the "Connection refused" error is raised after 2 seconds.

Remove unused parameter accept: from ListenSocket
Until now the connect_timeout handling worked like so:

- All hosts are passed to connect_start
- If a timeout happens then the related host is removed from the hosts list
- This runs until no timeout happens and either a connection is established or aborted with CONNECTION_BAD

- The downside is that this might connect multiple times to one and the same host.

This changes the connect_timeout handling like so:

- All hosts are passed to connect_start
- As soon as the host is tried to connect the related host is removed from the hosts list
- This runs until no hosts are left after timeout or either a connection is established or aborted with CONNECTION_BAD

- The downside is that this connects only once to hosts which are listed twice.

This also fixes the "socket not connected" issue on Windows.
And it harmonizes the handling of connection parameters in CancelConnection.
@larskanis larskanis force-pushed the ajmaidak-try-next-host-on-timeout branch from fd306c0 to ee11bfa Compare July 11, 2025 08:33
@larskanis larskanis merged commit e0ce86e into ged:master Jul 11, 2025
17 of 22 checks passed
@larskanis larskanis deleted the ajmaidak-try-next-host-on-timeout branch July 11, 2025 15:39
@larskanis
Copy link
Copy Markdown
Collaborator Author

I added some documentation to this PR in commit 325e22b .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

async connection API does not properly handle timeouts with multiple hosts

1 participant