Read-atomic/crash-atomic NodeFSStorageAdapter by expede · Pull Request #600 · automerge/automerge-repo

expede · 2026-04-19T00:12:11Z

The NodeFS storage backend can fail and leave torn writes. This PR improves the NodeFS write safety guarantees for POSIX and Windows targets. Possibly controversial is that we explicitly fsync on POSIX, which adds up to 100us-1ms per write (depending on the specific SSD hardware) in exchange for durability guarantees. This doesn't get us to fully transactional writes (with CAS before/after gates, WAL, etc etc), but SIGNIFICANTLY improves atomic write reliability with read-/crash-atomicity.

Anecdotally: we were getting lots of torn writes in pushwork (due to an early exit bug) — we switched to this before fixing pushwork and haven't seen any torn writes since.

We've rebased subductionjs over this PR. That branch adds a saveBatch to the interface for performance (e.g. hitting IDB many times but then need to update all instances) — this PR can be applied to those semantics, too

Copilot

Pull request overview

Updates the NodeFS-backed storage adapter to make filesystem writes read-atomic and (on POSIX) crash-atomic using a temp-file + fsync + rename strategy, and adds targeted tests/documentation for the new durability guarantees.

Changes:

Implement atomic write path for save() using <baseDirectory>/.tmp/, fsync, and rename, plus POSIX directory fsync for rename durability.
Add cache rollback behavior on write failures to keep in-memory state consistent with on-disk state.
Extend NodeFS adapter tests with atomicity/durability and cache-rollback scenarios; expand package docs with durability/atomicity details.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
packages/automerge-repo-storage-nodefs/src/index.ts	Implements atomic write + directory fsync durability, tmp directory handling, and cache rollback behavior.
packages/automerge-repo-storage-nodefs/test/NodeFSStorageAdapter.test.ts	Adds new tests intended to validate atomic write behavior and cache rollback on failures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

darcyparker · 2026-06-06T03:35:59Z

@alexjg @msakrejda @expede, #600 lines up with a cluster of open issues and PRs that all look like one failure class, and it seemed worth connecting them in one place.

The class: async work from storage or sync (a rejected save/load, a throw while decoding a peer message) that is neither awaited nor caught. It surfaces as an unhandled rejection or uncaught exception, so in Node the process exits, and
any in-flight write is torn or lost.

You can reproduce the mechanism in isolation: #673 includes a self-contained script (eventemitter3 + a real socket data handler) where a throw in a listener escapes emit(), reaches the event loop, and exits the process with a non-zero
code. That is the same shape behind every PR below.

Existing issues that are this class:

Storage errors cause node to exit #389 "Storage errors cause node to exit": a saveDoc rejection from a fire-and-forget listener becomes an unhandledRejection and exits. Still open, with a corroborating report.
Intermittent CI test failures #275 "Intermittent CI test failures": the flake is an unhandled rejection (bad bloom: ... not a Uint8Array, a malformed inbound sync message) firing after teardown.
Data durability question #264 "Data durability question": the durability motivation behind this PR.

The PRs, by layer:

Disk atomicity: Read-atomic/crash-atomic NodeFSStorageAdapter #600 (write-temp + fsync + rename) and Add saveBatch and safer write order for Subduction bridge #602 (write order). The "even if we crash mid-write, don't tear the file" layer.
Catching the error at the listener / fire-and-forget boundary. These are all the same shape (an event listener or un-awaited call whose async work rejects or throws), so the useful split is by what fails:
- Storage I/O: fix(StorageSource): settle the storage source when a load fails #671 (StorageSource load), fix(StorageSource): catch errors from the throttled save #676 (StorageSource save, on the heads-changed listener, i.e. the save side of Storage errors cause node to exit #389), and fix(StorageSubsystem): reset the compacting flag in a finally #672 (StorageSubsystem compaction).
- Untrusted peer input: fix(Repo): catch errors thrown while handling an inbound message #673 (Repo inbound-message listener) and fix(WebSocketServerAdapter): catch errors thrown while handling a message #674 (WebSocketServerAdapter decode). A malformed peer message can currently exit a Node sync server; WebSocket Ready flag should only be set after Open event #297 was an earlier instance on the send path, fixed by not throwing.

Bottom line: one failure class at different layers (uncaught async from storage/sync, leading to unhandled rejection / process exit / torn or lost writes). #600 makes a write survive a crash; the rejection-handling PRs stop the crash and stop dropping the error. They are complementary.

The deeper "do it properly" direction (make storage/network I/O abort-aware and properly awaited rather than fire-and-forget) is sketched in a WIP branch, darcy/promisifications_phase0: an optional AbortSignal on the adapter interfaces, withAbort/withTimeout helpers, and a dev-docs/abort-patterns.md write-up. It is not a PR, and I will likely break it into focused PRs the way #671 and #676 already are; sharing it for the direction.

Happy to fold the cross-references into #389 as an umbrella, or open a short tracking issue, if that is easier to follow.

expede force-pushed the nodefs-atomic-writes branch 2 times, most recently from a5725b5 to af0af0a Compare April 19, 2026 00:30

expede changed the title ~~WIP~~ Read-atomic/crash-atomic NodeFSStorageAdapter Apr 19, 2026

expede marked this pull request as ready for review April 19, 2026 00:34

Copilot AI review requested due to automatic review settings April 19, 2026 00:34

Copilot started reviewing on behalf of expede April 19, 2026 00:34 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

expede force-pushed the nodefs-atomic-writes branch from af0af0a to bb97333 Compare April 19, 2026 00:51

expede requested a review from Copilot April 19, 2026 00:51

Copilot started reviewing on behalf of expede April 19, 2026 00:52 View session

Copilot AI reviewed Apr 19, 2026

View reviewed changes

Comment thread packages/automerge-repo-storage-nodefs/src/index.ts

Comment thread packages/automerge-repo-storage-nodefs/src/index.ts Outdated

Comment thread packages/automerge-repo-storage-nodefs/src/index.ts

expede force-pushed the nodefs-atomic-writes branch 2 times, most recently from 5496db6 to 0e2d814 Compare April 19, 2026 01:13

Read-atomic/crash-atomic NodeFSStorageAdapter

628b3c9

expede force-pushed the nodefs-atomic-writes branch from 0e2d814 to 628b3c9 Compare April 19, 2026 01:16

This was referenced Apr 19, 2026

Read-atomic/crash-atomic NodeFS & Subduction bridge #598

Closed

Tracking PR: Subduction support #601

Draft

Add saveBatch and safer write order for Subduction bridge #602

Merged

expede requested a review from alexjg April 27, 2026 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read-atomic/crash-atomic NodeFSStorageAdapter#600

Read-atomic/crash-atomic NodeFSStorageAdapter#600
expede wants to merge 1 commit into
mainfrom
nodefs-atomic-writes

expede commented Apr 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

darcyparker commented Jun 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

expede commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

darcyparker commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

expede commented Apr 19, 2026 •

edited

Loading

darcyparker commented Jun 6, 2026 •

edited

Loading