Skip to content

Commit 998c6e8

Browse files
committed
Add new Cache interface
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
1 parent 964ce2a commit 998c6e8

2 files changed

Lines changed: 657 additions & 14 deletions

File tree

adr/20260202-global-cache.md

Lines changed: 32 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -53,15 +53,18 @@ These limitations result in redundant computation when:
5353

5454
## Solution Approach
5555

56-
Extend the existing `nf-cloudcache` plugin to support content-addressable global caching on cloud object storage (S3, GCS, Azure Blob).
56+
Refactor Nextflow's task caching into a plugin-extensible architecture so that different global cache implementations can be delivered as plugins, with the existing local/cloud cache behaviour preserved as the default. The first concrete global cache implementation extends `nf-cloudcache` for content-addressable caching on cloud object storage (S3, GCS, Azure Blob).
5757

5858
**Rationale for this approach:**
59-
- `nf-cloudcache` already exists and handles cloud storage integration
60-
- Cloud storage provides strong consistency guarantees for concurrent access
61-
- Many organizations already use cloud storage for work directories
62-
- Cloud providers support atomic operations needed for coordination
63-
- No new infrastructure required
64-
- Scalable and accessible from anywhere
59+
- A pluggable architecture lets different global cache implementations co-exist without forking core code.
60+
- Today's cache logic is hardcoded across `TaskProcessor`, `TaskHasher`, `CacheDB`, `PublishDir`, and `FilePorter` — different cache implementations need different decisions about task identity, coordination, output storage, ref counting, and cleanup. These decisions belong behind interfaces, not in core.
61+
- `nf-cloudcache` already exists and handles cloud storage integration, providing a natural starting point for the first global cache implementation.
62+
- Cloud storage provides strong consistency guarantees for concurrent access and supports atomic operations needed for coordination.
63+
- Many organizations already use cloud storage for work directories.
64+
- No new infrastructure required for the cloud-storage variant.
65+
- Scalable and accessible from anywhere.
66+
67+
The refactor itself is described in `specs/260507-pluggable-cache-architecture/spec.md`. It introduces five plugin-extensible seams — pluggable `TaskHasher`, outputs-shaped cache resolution (`beginTask`/`endTask`), workdir adoption, file-usage events, and optional cleanup capability — and preserves byte-identical behaviour when no plugin is registered.
6568

6669
**Trade-offs:**
6770
- Higher latency than local filesystem (~100-500ms vs ~10ms)
@@ -1050,26 +1053,41 @@ process prod_analysis { ... }
10501053

10511054
### Implementation Plan
10521055

1056+
The implementation is split between (a) a core refactor that introduces plugin-extensible cache interfaces (covered by `specs/260507-pluggable-cache-architecture/spec.md`) and (b) the global cache implementation itself, which becomes the first non-default consumer of those interfaces.
1057+
10531058
**Phase 0: Proof of concept (#6100)**
10541059
1. Associate nf-cloudcache path and workdir with the global-cache path and active resume by default
10551060
2. Constant sessionId (0000-000-000), remove processName from task hash
10561061
3. Optional: Use deep cache mode
10571062

1058-
**Phase 1: Core functionality*
1059-
1. Implement global hash algorithm (no sessionId, no processName)
1060-
2. Implement content-based file hashing
1063+
**Phase 1: Pluggable cache architecture (core refactor)**
1064+
1065+
See `specs/260507-pluggable-cache-architecture/spec.md` for the full design. Summary:
1066+
1. Pluggable `TaskHasher` factory (extending #6927) so cache implementations declare their preferred hasher.
1067+
2. Outputs-shaped cache resolution (`beginTask` / `endTask`) on `CacheDB`; default implementation preserves today's `LockManager` + workdir-mkdir loop, relocated.
1068+
3. Workdir adoption hook (`adopt`) + `WorkdirDisposition` (KEEP/DELETE) so cache implementations can move outputs into managed storage and dispose of the workdir.
1069+
4. File-usage events (`notifyPublish`, `notifyFilePort`) for ref-counted cleanup.
1070+
5. Optional `CleanupCapable` capability for `nextflow cache clean ...`.
1071+
1072+
All five seams ship with default implementations that preserve byte-identical behaviour and on-disk format.
1073+
1074+
**Phase 2: Global cache hasher**
1075+
1. Implement global hash algorithm as a `TaskHasher` plugin (no sessionId, no processName).
1076+
2. Implement content-based file hashing inside the global hasher.
10611077

1062-
**Phase 2: Concurrency control**
1063-
1. Add cloud storage lock acquisition (S3 conditional PUT)
1064-
2. Test race condition handling
1078+
**Phase 3: Global cache implementation**
1079+
1. Implement a `CacheFactory` + `CacheDB` for the global cache, wiring outputs-shaped restore, adoption (with `DELETE` disposition for cache-managed outputs), and file-usage events.
1080+
2. Add cloud storage lock acquisition (S3 conditional PUT, GCS `ifGenerationMatch=0`, Azure `If-None-Match: *`) inside the global `beginTask` implementation.
1081+
3. Test race condition handling.
10651082

10661083
**Phase 4: Polish**
10671084
1. Add configuration options
1068-
2. Implement cache cleanup commands
1085+
2. Implement cache cleanup commands via `CleanupCapable`
10691086
3. Documentation and examples
10701087

10711088
## Links
10721089

1090+
- [Pluggable cache architecture spec](../specs/260507-pluggable-cache-architecture/spec.md) - Core refactor enabling pluggable global cache implementations
10731091
- [nf-cloudcache plugin](../plugins/nf-cloudcache/) - Foundation for global cache
10741092
- [CloudCacheConfig](../plugins/nf-cloudcache/src/main/nextflow/cache/CloudCacheConfig.groovy) - Configuration class
10751093
- [TaskProcessor.groovy](../modules/nextflow/src/main/groovy/nextflow/processor/TaskProcessor.groovy) - Cache checking logic (lines 825-839, 925-1001)

0 commit comments

Comments
 (0)