Skip to content

Improve large-library reliability with optional stateful engine and adaptive backoff#1321

Open
MadsenDev wants to merge 2 commits intoicloud-photos-downloader:masterfrom
MadsenDev:master
Open

Improve large-library reliability with optional stateful engine and adaptive backoff#1321
MadsenDev wants to merge 2 commits intoicloud-photos-downloader:masterfrom
MadsenDev:master

Conversation

@MadsenDev
Copy link
Copy Markdown

Summary

This PR improves reliability and scalability for large iCloud photo libraries while preserving existing CLI behavior by default.

It introduces an optional stateful engine mode that unifies retry/backoff handling, adds adaptive throttling, bounded download concurrency, and task-level resumability via SQLite.

Default (stateless) behavior remains unchanged.

This update has not yet been tested with an actual iCloud account.


Motivation

Users with large libraries (50k–200k assets) frequently encounter systemic failures due to:

  • 429 / throttling responses
  • 503 transient errors
  • ACCESS_DENIED conditions that behave like throttling
  • Limited retry coverage across layers
  • Rerun-based recovery that re-enumerates assets repeatedly

These issues often require multiple manual restarts to complete large downloads.

This PR addresses those pain points while keeping the existing user experience intact.


Key Changes

1. Unified Retry & Backoff

  • Configurable retry count (default > 0)
  • Exponential backoff with jitter
  • Honors Retry-After for 429 and 503
  • Error classification (fatal vs transient vs re-auth)

Applied consistently across metadata queries and downloads.


2. Optional Stateful Engine Mode

  • Introduces a SQLite-backed task database
  • Tracks per-asset per-version task state
  • Adds checkpointing during enumeration
  • Supports safe resume after interruption
  • Lease-based task requeueing for crash recovery

Stateless filesystem-based resume remains available if state DB is not used.


3. Bounded Concurrency with Adaptive Throttling

  • Adds configurable download worker pool
  • Introduces account-scoped rate limiter
  • Adaptive behavior on throttle events (cool-down + reduced concurrency)
  • Prevents repeated burst-triggered throttling loops

4. Download Improvements

  • Increased streaming chunk size for improved throughput
  • File size verification (default on)
  • Optional checksum verification
  • Hardened Range resume logic

5. Reduced Unnecessary Metadata Requests

  • Optional --no-remote-count
  • Tunable album page size

6. Observability & Exit Semantics

  • Structured logging improvements
  • Clearer exit codes for partial vs fatal failures
  • State DB maintenance options

Backward Compatibility

  • Existing CLI behavior remains default.
  • No changes to folder structure or naming policies.
  • Experimental/stateful features are opt-in.
  • Deprecated --threads-num remains untouched.

Trade-offs & Scope

This PR increases internal complexity due to the addition of task persistence and concurrency controls.

However, the goal was to maintain:

  • Backward compatibility
  • Default-safe behavior
  • Single-host operational simplicity

Distributed scaling and async rewrites are explicitly out of scope.


Testing

  • Unit tests updated for retry/backoff logic
  • Fault-injection testing for 429/503 and resume edge cases
  • Manual large-library testing pending broader validation

Request for Feedback

This is a larger change set. I’m happy to split this into smaller PRs if preferred, or adjust implementation details based on maintainer guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant