Fix SQLite corruption on bucket-mounted Spaces#501
Conversation
…ket-mounted Spaces When Trackio runs on an HF Space with a bucket mount (hf-mount), the FUSE filesystem doesn't support file locking. SQLite depends on fcntl/flock for internal consistency, so even a single process can see corruption when locks are silently no-ops. Fix: detect bucket-mount environment (TRACKIO_BUCKET_ID + SYSTEM=spaces) and switch to a single persistent connection with PRAGMA locking_mode=EXCLUSIVE. This tells SQLite to grab the lock once and never release it -- with only one connection open, there's no contention and no reliance on filesystem locking. Also switches ProcessLock to use in-memory threading.Lock instead of file-based locking in the same environment, since file locks on the mount are unreliable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
🪼 branch checks and previews
|
🦄 change detectedThis Pull Request includes changes to the following packages.
|
🪼 branch checks and previews
Install Trackio from this PR (includes built frontend) pip install "https://huggingface.co/buckets/trackio/trackio-wheels/resolve/656398d01f18dbaf110edff38a4cd446992416ff/trackio-0.22.0-py3-none-any.whl" |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Pull request overview
This PR addresses SQLite corruption seen on Hugging Face Spaces when Trackio runs on a bucket-mounted (FUSE) filesystem where file locking is unreliable, by switching to an exclusive-locking + single-connection strategy in that environment.
Changes:
- Add bucket-mount detection (
TRACKIO_BUCKET_ID+SYSTEM=spaces) and enablePRAGMA locking_mode=EXCLUSIVE. - Introduce per-DB persistent SQLite connections and serialize access via in-memory locks when exclusive locking is enabled.
- Update
ProcessLockto use an in-memorythreading.Lockinstead of file locks in the same environment, and add a unit test covering the exclusive mode behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
trackio/sqlite_storage.py |
Adds exclusive-locking mode detection, persistent connections, and switches locking strategy in bucket-mounted Spaces. |
tests/unit/test_sqlite_storage.py |
Adds a unit test validating exclusive locking mode and persistent connection creation. |
.changeset/rare-olives-find.md |
Adds a changeset entry documenting the feature/fix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…p, fix docstring - Narrow `except Exception` to `except sqlite3.Error` in persistent connection health check and cleanup - Add `_close_all_persistent_connections()` with `atexit` hook to prevent leaking file descriptors in long-running processes - Clarify ProcessLock docstring: in-memory threading.Lock is single-process only - Add comment explaining why `configure_pragmas` is intentionally ignored in the exclusive-locking path - Remove `test_exclusive_locking_not_enabled_by_default` test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covered by the e2e Spaces tests which exercise the real bucket-mount environment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TRACKIO_BUCKET_ID is not set as a Space variable, so checking for it would miss the bucket-mount case. Simplify to just SYSTEM=spaces — a single persistent connection is the right default for any Space environment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Unrelated change, removes PERSISTANT_STORAGE_ENABLED as we no longer have persistent storage on Spaces
|
This is relatively small so I'll go ahead and merge this in so that we can test this out in live spaces |
|
I investigated this a bit with @XciD and found that it probably wasn't a locking issue, it was the fact that previously (before this PR), we were trying to create a new SQlite connection every request, and some of them got dropped because the SQlite lock took a long time to resolve. Now we have a fixed single persistent connection that all requests use (each request just adds to an in-memory queue). I was not able to reproduce the stronger bucket-specific claim that HF bucket mounts silently ignore locks, and in fact, in my tests the mounted bucket behaved the same as a local /tmp file |
hf-mount), the FUSE filesystem doesn't support file locking. SQLite depends onfcntl/flockfor internal consistency, so even a single process can see corruption when locks are silently no-ops.TRACKIO_BUCKET_ID+SYSTEM=spaces) and switches to a single persistent connection withPRAGMA locking_mode=EXCLUSIVE. SQLite grabs the lock once and never releases it — with only one connection open, there's no contention and no reliance on filesystem locking.ProcessLockto use in-memorythreading.Lockinstead of file-based locking in the same environment.This is a much simpler alternative to the Parquet storage backend approach (~110 lines changed vs ~3500) proposed by Claude Code and reviewed by myself to solve the same root cause.
cc @Wauplin @XciD for visibility