Skip to content

Read-atomic/crash-atomic NodeFSStorageAdapter#600

Open
expede wants to merge 1 commit into
mainfrom
nodefs-atomic-writes
Open

Read-atomic/crash-atomic NodeFSStorageAdapter#600
expede wants to merge 1 commit into
mainfrom
nodefs-atomic-writes

Conversation

@expede

@expede expede commented Apr 19, 2026

Copy link
Copy Markdown
Member

The NodeFS storage backend can fail and leave torn writes. This PR improves the NodeFS write safety guarantees for POSIX and Windows targets. Possibly controversial is that we explicitly fsync on POSIX, which adds up to 100us-1ms per write (depending on the specific SSD hardware) in exchange for durability guarantees. This doesn't get us to fully transactional writes (with CAS before/after gates, WAL, etc etc), but SIGNIFICANTLY improves atomic write reliability with read-/crash-atomicity.

Anecdotally: we were getting lots of torn writes in pushwork (due to an early exit bug) — we switched to this before fixing pushwork and haven't seen any torn writes since.

We've rebased subductionjs over this PR. That branch adds a saveBatch to the interface for performance (e.g. hitting IDB many times but then need to update all instances) — this PR can be applied to those semantics, too

@expede expede force-pushed the nodefs-atomic-writes branch 2 times, most recently from a5725b5 to af0af0a Compare April 19, 2026 00:30
@expede expede changed the title WIP Read-atomic/crash-atomic NodeFSStorageAdapter Apr 19, 2026
@expede expede marked this pull request as ready for review April 19, 2026 00:34
Copilot AI review requested due to automatic review settings April 19, 2026 00:34

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the NodeFS-backed storage adapter to make filesystem writes read-atomic and (on POSIX) crash-atomic using a temp-file + fsync + rename strategy, and adds targeted tests/documentation for the new durability guarantees.

Changes:

  • Implement atomic write path for save() using <baseDirectory>/.tmp/, fsync, and rename, plus POSIX directory fsync for rename durability.
  • Add cache rollback behavior on write failures to keep in-memory state consistent with on-disk state.
  • Extend NodeFS adapter tests with atomicity/durability and cache-rollback scenarios; expand package docs with durability/atomicity details.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
packages/automerge-repo-storage-nodefs/src/index.ts Implements atomic write + directory fsync durability, tmp directory handling, and cache rollback behavior.
packages/automerge-repo-storage-nodefs/test/NodeFSStorageAdapter.test.ts Adds new tests intended to validate atomic write behavior and cache rollback on failures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/automerge-repo-storage-nodefs/test/NodeFSStorageAdapter.test.ts Outdated
Comment thread packages/automerge-repo-storage-nodefs/test/NodeFSStorageAdapter.test.ts Outdated
Comment thread packages/automerge-repo-storage-nodefs/src/index.ts Outdated
Comment thread packages/automerge-repo-storage-nodefs/src/index.ts

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/automerge-repo-storage-nodefs/src/index.ts
Comment thread packages/automerge-repo-storage-nodefs/src/index.ts Outdated
Comment thread packages/automerge-repo-storage-nodefs/src/index.ts
@expede expede force-pushed the nodefs-atomic-writes branch 2 times, most recently from 5496db6 to 0e2d814 Compare April 19, 2026 01:13
@darcyparker

darcyparker commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

@alexjg @msakrejda @expede, #600 lines up with a cluster of open issues and PRs that all look like one failure class, and it seemed worth connecting them in one place.

The class: async work from storage or sync (a rejected save/load, a throw while decoding a peer message) that is neither awaited nor caught. It surfaces as an unhandled rejection or uncaught exception, so in Node the process exits, and
any in-flight write is torn or lost.

You can reproduce the mechanism in isolation: #673 includes a self-contained script (eventemitter3 + a real socket data handler) where a throw in a listener escapes emit(), reaches the event loop, and exits the process with a non-zero
code. That is the same shape behind every PR below.

Existing issues that are this class:

The PRs, by layer:

Bottom line: one failure class at different layers (uncaught async from storage/sync, leading to unhandled rejection / process exit / torn or lost writes). #600 makes a write survive a crash; the rejection-handling PRs stop the crash and stop dropping the error. They are complementary.

The deeper "do it properly" direction (make storage/network I/O abort-aware and properly awaited rather than fire-and-forget) is sketched in a WIP branch, darcy/promisifications_phase0: an optional AbortSignal on the adapter interfaces, withAbort/withTimeout helpers, and a dev-docs/abort-patterns.md write-up. It is not a PR, and I will likely break it into focused PRs the way #671 and #676 already are; sharing it for the direction.

Happy to fold the cross-references into #389 as an umbrella, or open a short tracking issue, if that is easier to follow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants