Skip to content

fix(db): dispose engine connection pool at process exit#256

Open
JanSobus wants to merge 1 commit intoreanahub:masterfrom
JanSobus:fix/close-db-connections-on-exit
Open

fix(db): dispose engine connection pool at process exit#256
JanSobus wants to merge 1 commit intoreanahub:masterfrom
JanSobus:fix/close-db-connections-on-exit

Conversation

@JanSobus
Copy link
Copy Markdown

What this PR does

Registers engine.dispose() as an atexit handler in reana_db/database.py so that all pooled database connections are cleanly closed — with a proper TCP FIN — before any process using reana-db exits.

Root cause

The module-level engine in database.py holds a SQLAlchemy connection pool. When a short-lived process exits (e.g. a CronJob container), this pool is garbage-collected without calling engine.dispose(). Python's GC does not send PostgreSQL Terminate messages, so the server sees an abrupt disconnect and logs:

LOG: unexpected EOF on client connection with an open transaction

For Flask-based commands this is compounded by teardown_appcontext calling only session.remove() (which returns connections to the pool) without disposing the pool itself.

Affected CronJob containers

All four CronJob containers that use reana-db are fixed by this single change, since they all share the same module-level engine:

CronJob Command
reana-system-status flask reana-admin status-report
reana-retention-rules-apply flask reana-admin retention-rules-apply
reana-resource-quota-update reana-db quota resource-usage-update
reana-interactive-session-cleanup flask reana-admin interactive-session-cleanup

Note: reana-interactive-session-cleanup was not mentioned in the original issue but is affected by the same root cause.

The fix

atexit.register(engine.dispose)

atexit is Python's standard library — no new package dependency. The handler fires once when the process is about to exit, regardless of how the process was started (Flask CLI, plain Click CLI, or long-running server). For long-running services, this is equally correct: engine.dispose() on graceful shutdown cleanly closes all idle pooled connections.

Tests

A unit test (tests/test_database.py) verifies that engine.dispose is registered as an atexit handler by reloading the module with a mocked atexit.register and asserting it was called.

Verifying the fix end-to-end

The unit test confirms the registration, but the most direct proof is observing PostgreSQL's connection logs before and after. To reproduce on any deployment with access to PostgreSQL logs:

1. Enable connection logging:

ALTER SYSTEM SET log_connections = on;
ALTER SYSTEM SET log_disconnections = on;
SELECT pg_reload_conf();

2. Run a CronJob command without the fix:

reana-db quota resource-usage-update

3. Check PostgreSQL logs — you'll see:

LOG: unexpected EOF on client connection with an open transaction

4. Apply this fix and run the same command again — you'll see instead:

LOG: disconnection: session time: 0:00:00.012 user=postgres database=postgres ...

A clean disconnection entry confirms that engine.dispose() sent the proper Terminate message before the process exited.

Closes reanahub/reana#943

Register engine.dispose() as an atexit handler so that all pooled
database connections are cleanly closed before any process using
reana-db exits. Without this, the connection pool is garbage-collected
without sending PostgreSQL Terminate messages, causing the server to
log "unexpected EOF on client connection with an open transaction"
at the end of every CronJob run.

This fixes the noise produced by the reana-retention-rules-apply,
reana-resource-quota-update, reana-system-status, and
reana-interactive-session-cleanup CronJob containers, all of which
share the module-level engine defined here.

Closes reanahub/reana#943
@JanSobus JanSobus force-pushed the fix/close-db-connections-on-exit branch from 8bcbd14 to 1229a02 Compare April 26, 2026 05:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SQLAlchemy cronjobs do not cleanly close their connections

1 participant