Support psycopg as a PostgreSQL driver#18999
Support psycopg as a PostgreSQL driver#18999clokep wants to merge 47 commits intoelement-hq:developfrom
Conversation
Use the proper isolation level on psycopg, where it is no longer an int.
…tch() instead of execute_values() execute_values() from psycopg2 can do insertions with fetch=False, but psycopg is better off inserting with execute_batch() instead(because of pipelining support)
|
I think this is ready for a review, I'm sure it'll need a few changes, but overall I think it is sound. |
reivilibre
left a comment
There was a problem hiding this comment.
Thanks both of you, looks like a tremendous amount of work has gone into this!
From my perspective, this looks virtually ready, just the really minor things I've pointed out, I'd say:
- as said, I would like more eyes on the
COPY-based replacement for batch execution. It is probably fine, though. - It looks like some optimisations have been walked back compared to Psycopg2. I personally think this is fine, though it would be ideal to have a tracking issue containing a ticklist of these, so we can track their re-implementation as a prerequisite to considering Psycopg support 'complete'/first-class/ready to replace Psycopg2.
Other than that, it all looks clear and reasonable to me!
|
|
||
| def set_statement_timeout(self, cursor: Cursor, statement_timeout: int) -> None: | ||
| """Configure the current cursor's statement timeout.""" | ||
| cursor.execute("SET statement_timeout TO ?", (statement_timeout,)) |
There was a problem hiding this comment.
slight preference to using the explicit SET SESSION form. But this is only cosmetic, no semantic change
There was a problem hiding this comment.
When the time comes to remove psycopg2, there is a property that will be set instead. I do not recall why we did not do this here, but probably can rectify in future work. It's probable that at the time of initial bring up, that property did not exist yet(but I would not quote that)
| Generic[ConnectionType, CursorType, IsolationLevelType], | ||
| BaseDatabaseEngine[ConnectionType, CursorType, IsolationLevelType], | ||
| metaclass=abc.ABCMeta, | ||
| ): |
There was a problem hiding this comment.
a mini docstring pointing out this is (now) an abstract class, with Pyscopg2Engine and PsycopgEngine as concrete implementations, might be nice
| # We use fetch = False to mean a writable query. You *might* be able | ||
| # to morph that into a COPY (...) FROM STDIN, but it isn't worth the | ||
| # effort for the few places we set fetch = False. | ||
| assert fetch is True | ||
|
|
||
| # execute_values requires a single replacement, but we need to expand it | ||
| # for COPY. This assumes all inner sequences are the same length. | ||
| value_str = "(" + ", ".join("?" for _ in next(iter(values))) + ")" | ||
| sql = sql.replace("?", ", ".join(value_str for _ in values)) | ||
|
|
||
| # Wrap the SQL in the COPY statement. | ||
| sql = f"COPY ({sql}) TO STDOUT" | ||
|
|
||
| def f( | ||
| the_sql: str, the_args: Sequence[Sequence[Any]] | ||
| ) -> Iterable[Tuple[Any, ...]]: | ||
| with self.txn.copy(the_sql, the_args) as copy: | ||
| yield from copy.rows() | ||
|
|
||
| # Flatten the values. | ||
| return self._do_execute(f, sql, list(itertools.chain.from_iterable(values))) |
There was a problem hiding this comment.
I'm not saying there's anything wrong with this, but this looks non-trivial I'd like to get some extra pair of eyes on this.
Slightly suspicious about next(iter(values)) — would that advance an iterator such as a generator expression, in a way that would drop the first row?
There was a problem hiding this comment.
Slightly suspicious about next(iter(values)) — would that advance an iterator such as a generator expression, in a way that would drop the first row?
This exact concern is obsoleted by d49827d (now accepts a Collection, which shouldn't have this risk).
Still think I'd like to take a second look at this aspect with a fresh mind and/or get some extra eyes, but in principle it seems fine
| lambda the_sql: execute_batch(self.txn, the_sql, args), sql | ||
| ) | ||
|
|
||
| # TODO Can psycopg3 do anything better? |
There was a problem hiding this comment.
Just opening this as a thread for discussion to see whether we care enough to either do this now, or open a ticket for it, or something else.
|
|
||
| txn.execute_values(sql, args, fetch=False) | ||
|
|
||
| # TODO Maybe improve for psycopg. |
There was a problem hiding this comment.
Just opening this as a thread for discussion to see whether we care enough to either do this now, or open a ticket for it, or something else.
| """ | ||
|
|
||
| if isinstance(txn.database_engine, PostgresEngine): | ||
| if isinstance(txn.database_engine, Psycopg2Engine): |
There was a problem hiding this comment.
maybe we open a ticket about this as well. A solution might be to pass in an array and rely on UNNEST, but that is something I'm more than happy to defer to later, it just would be nice to remember to do it
| else | ||
| export PASS_SYNAPSE_COMPLEMENT_USE_WORKERS= | ||
| if [[ -n "$POSTGRES" ]]; then | ||
| if [[ "$POSTGRES" = "psycopg" ]]; then |
There was a problem hiding this comment.
judging from the CI yaml this needs to be case-insensitive (to match Psycopg from the CI matrix.database); or am I wrong?
There was a problem hiding this comment.
I imagine you are correct. I think I was using it from the command line and tested with it's lower case function. I will adjust to use the bash "${POSTGRES,,}}" pattern instead, which should lower case all the letters
There was a problem hiding this comment.
This actually opened a can of worms. You were correct to bring this up, as it turned out that Complement in CI was not using this driver for two reasons:
- I had not enabled that this driver would be recognized when in
WORKERS=1mode 🤦♂️ which was how the example testing was set up - The Complement docker image will actually not run with
psycopgin it's "pure python" installation method without significant changes, that I am not willing to attempt at this time. After 4 hours of exploring how to make this work I came to the conclusion that the easiest method to move past this is to not use the "pure python" installation but use either the "binary" or "c" installations instead. Neither of these have the same symptoms(these other installation types have the necessary libraries bundled with them)
some more details about point 2:
This was manifesting assynapse_main | **********************************************************************************
synapse_main | Error during initialisation:
synapse_main | Traceback (most recent call last):
synapse_main | File "/usr/local/lib/python3.13/site-packages/synapse/app/homeserver.py", line 486, in main
synapse_main | setup(hs)
synapse_main | ~~~~~^^^^
synapse_main | File "/usr/local/lib/python3.13/site-packages/synapse/app/homeserver.py", line 422, in setup
synapse_main | hs.setup()
synapse_main | ~~~~~~~~^^
synapse_main | File "/usr/local/lib/python3.13/site-packages/synapse/server.py", line 636, in setup
synapse_main | self.datastores = Databases(self.DATASTORE_CLASS, self)
synapse_main | ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
synapse_main | File "/usr/local/lib/python3.13/site-packages/synapse/storage/databases/__init__.py", line 84, in __init__
synapse_main | engine = create_engine(database_config.config)
synapse_main | File "/usr/local/lib/python3.13/site-packages/synapse/storage/engines/__init__.py", line 74, in create_engine
synapse_main | return PsycopgEngine(database_config)
synapse_main | File "/usr/local/lib/python3.13/site-packages/synapse/storage/engines/__init__.py", line 48, in __new__
synapse_main | raise RuntimeError(
synapse_main | f"Cannot create {cls.__name__} -- psycopg module is not installed"
synapse_main | )
synapse_main | RuntimeError: Cannot create PsycopgEngine -- psycopg module is not installed
synapse_main |
synapse_main | There may be more information in the logs.
synapse_main | **********************************************************************************
Which suggested that the module of psycopg was not even installed. This module does appear during the pip installation logs in CI, so was a slight misdirection.
I then removed the exception catch for the importing of PsycopgEngine hoping it would give me more information. It did.
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/local/lib/python3.13/site-packages/synapse/app/homeserver.py", line 42, in <module>
from synapse.app import _base
File "/usr/local/lib/python3.13/site-packages/synapse/app/_base.py", line 75, in <module>
from synapse.events.auto_accept_invites import InviteAutoAccepter
File "/usr/local/lib/python3.13/site-packages/synapse/events/auto_accept_invites.py", line 28, in <module>
from synapse.module_api import EventBase, ModuleApi, run_as_background_process
File "/usr/local/lib/python3.13/site-packages/synapse/module_api/__init__.py", line 58, in <module>
from synapse.handlers.auth import (
...<7 lines>...
)
File "/usr/local/lib/python3.13/site-packages/synapse/handlers/auth.py", line 57, in <module>
from synapse.api.ratelimiting import Ratelimiter
File "/usr/local/lib/python3.13/site-packages/synapse/api/ratelimiting.py", line 27, in <module>
from synapse.storage.databases.main import DataStore
File "/usr/local/lib/python3.13/site-packages/synapse/storage/__init__.py", line 40, in <module>
from synapse.storage.databases import Databases
File "/usr/local/lib/python3.13/site-packages/synapse/storage/databases/__init__.py", line 26, in <module>
from synapse.storage._base import SQLBaseStore
File "/usr/local/lib/python3.13/site-packages/synapse/storage/_base.py", line 26, in <module>
from synapse.storage.database import (
...<3 lines>...
)
File "/usr/local/lib/python3.13/site-packages/synapse/storage/database.py", line 61, in <module>
from synapse.storage.background_updates import BackgroundUpdater
File "/usr/local/lib/python3.13/site-packages/synapse/storage/background_updates.py", line 39, in <module>
from synapse.storage.engines import PostgresEngine
File "/usr/local/lib/python3.13/site-packages/synapse/storage/engines/__init__.py", line 43, in <module>
from .psycopg import PsycopgEngine
File "/usr/local/lib/python3.13/site-packages/synapse/storage/engines/psycopg.py", line 18, in <module>
import psycopg
File "/usr/local/lib/python3.13/site-packages/psycopg/__init__.py", line 9, in <module>
from . import pq # noqa: F401 import early to stabilize side effects
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.13/site-packages/psycopg/pq/__init__.py", line 116, in <module>
import_from_libpq()
~~~~~~~~~~~~~~~~~^^
File "/usr/local/lib/python3.13/site-packages/psycopg/pq/__init__.py", line 108, in import_from_libpq
raise ImportError(
...<4 lines>...
)
ImportError: no pq wrapper available.
Attempts made:
- couldn't import psycopg 'c' implementation: No module named 'psycopg_c'
- couldn't import psycopg 'binary' implementation: No module named 'psycopg_binary'
- couldn't import psycopg 'python' implementation: libpq library not found
Inspection of the functions being called here(and some minor sleuthing in the psycopg issues page) led me to the conclusion that the ctypes.util.find_library("pq") was not finding what it wanted, namely the system level library of libpq5. This is installed in the base docker image, and does appear to be in the filesystem at the correct place in the final complement image. But was not found by this function anyway. Using a rough apt install -y libpq5 in the final complement image did allow it to work. But why was not found even though it was present? The issues tracker on psycopg did suggest that it could have also been caused by passing around data between distro-less docker images, this does not appear to be the case here at any point.
Since the next stages of introducing psycopg support to Synapse will be changing the installation method to either binary or c anyway, spending time to figure out why this is not working seems like a waste of time. This is the easy solution, it just means the time table gets moved up.
There was a problem hiding this comment.
After much testing and some debate, the c extra was chosen. The binary extra does not have a source distribution(only wheels) and building debian packages was failing(this appeared to be because the bundles libs were not being found, even though they were verifiably in the correct location. I suspected this was a RPATH issue).
The minimum version of psycopg chosen was originally v3.1.x. With the c extra this was causing the 'old deps' trial to fail. Bookworm packages a copy of libpq5 that was updated to support Postgres 18, but also broke psycopg prior to v3.2.8. Since we control the version of psycopg but not libpq5, the minimum version of psycopg was bumped to this version
…iate on psycop2 for now This will use executemany()(inside of execute_batch()) from psycopg instead, which uses pipeline mode
… the concrete definitions
… for use with workers yet
…ged libpq5 having Postgres 18 support which breaks previous versions of psycopg when compiled
… driver(in POSTGRES=psycopg)
This adds support for psycopg (aka psycopg3) as a PostgreSQL driver in addition to psycopg2. This requires adding some abstractions between psycopg2 and psycopg although most of the queries can be identical (since it is just PostgreSQL underneath).
This is a branch that @realtyem and myself have been working on for several years.
The bulk of this PR is splitting the
PostgresEngineinto aPsycopgEngineandPsycopg2Enginewhich use the correct underlying driver. We then need to update some if-statements in the code to differentiate between these.A bunch of configuration code also needs to be adjusted to handle having multiple engines for postgres.
This also updates the test matrix for trial to run against psycopg2 and psycopg.
This is step 1 of #14586.