Skip to content

v0.6.0: persistent caching, Spotlight discovery, 900× faster repeat runs#105

Open
django23 wants to merge 63 commits intostevegrunwell:developfrom
django23:improvements
Open

v0.6.0: persistent caching, Spotlight discovery, 900× faster repeat runs#105
django23 wants to merge 63 commits intostevegrunwell:developfrom
django23:improvements

Conversation

@django23
Copy link
Copy Markdown

@django23 django23 commented Mar 5, 2026

Summary

This PR builds on the v0.4.0 work (Bats migration, new sentinels, fixed dirs, --dry-run, error handling) and adds persistent caching, incremental Spotlight discovery, and major performance optimizations.

On a real-world system with ~460 dependency directories and 15K+ Spotlight candidates, repeat runs drop from ~75 seconds to ~5 seconds — a 15× improvement. First-time runs with many un-excluded paths drop from ~75 minutes to ~79 seconds thanks to the three-layer exclusion defense.

New features

  • Persistent path cache (~/.cache/asimov/paths) — stores discovered dependency paths across runs. Subsequent runs skip the full find traversal entirely
  • Incremental Spotlight discovery via mdfind — finds newly created projects without full filesystem traversal
  • Persistent excluded-path state (~/.cache/asimov/excluded) — tracks successful tmutil addexclusion calls, handles interrupted runs and Spotlight indexing delays
  • Persistent failed-path state (~/.cache/asimov/failed) — remembers paths where tmutil fails (e.g. Go module paths with @ characters), skipped on subsequent runs
  • mdfind-seen cache (~/.cache/asimov/mdfind_seen) — remembers checked candidates so no-sentinel directories aren't re-validated every run
  • Config file (~/.config/asimov/config) — enable/disable fixed dirs, add extra dirs/sentinels, disable built-in sentinels
  • --full-scan — force a full filesystem scan, ignoring all caches
  • --no-cache — run without reading or writing any cache
  • --stats — opt-in directory sizes and total-space summary
  • --quiet — suppress all non-error output
  • --verbose — show already-excluded directories and timing details
  • Optional [directory] argument — scope scans to a specific directory
  • Curl installer (scripts/install-remote.sh) — installs without Homebrew

Performance (real-world, ~460 dependency paths)

Metric v0.3.0 This PR
First run ~75 min ~79s
Repeat run ~75s ~5s

Key optimizations

  • Three-layer exclusion defense: bulk grep + fixed-dir descendant filter + tmutil isexcluded guard — eliminates redundant ~11s tmutil addexclusion calls
  • sort+comm for set difference instead of BSD grep -Fxvf — O(n log n) vs O(n×m)
  • Descendant prefix-filter eliminates ~14,000 nested paths before sentinel checking
  • Bulk grep -Fxvf replaces per-path subprocess spawns (~27,000 eliminated)
  • Bash parameter expansion replaces dirname/basename subprocesses
  • Streaming tee writes cache incrementally during find — interrupted scans leave partial cache
  • Per-path tmutil calls instead of batch — batch provides no speed benefit and one bad path fails the entire batch

Breaking changes

  1. Fixed dirs (global caches) now opt-in via config file (previously always on)
  2. LaunchAgent label renamed from com.stevegrunwell.asimov to com.django23.asimov

Tests

153 tests (Bats), shellcheck clean:

  • 60 sentinel tests (1:1 with sentinel array)
  • 19 cache tests (creation, cached runs, stale removal, incremental discovery, dedup, failed paths, mdfind-seen)
  • Behavioral tests, config tests, flag combination tests, format boundary tests, plist validation

- Removed PHP-based testing infrastructure, including PHPUnit and related files.
- Introduced Bats as the new testing framework with corresponding test scripts.
- Replaced Travis CI configuration with GitHub Actions for CI/CD.
- Updated README to reflect new installation and usage instructions.
- Added Makefile for simplified commands: `make install` and `make uninstall`.
- Updated `.gitignore` to exclude new files and directories.
- Enhanced the changelog with recent changes and additions.
- Added `make install` and `make uninstall` targets for easier setup and removal.
- Introduced `scripts/uninstall.sh` for clean removal of Asimov and its launchd schedule.
- Updated `com.stevegrunwell.asimov.plist` with common interval reference comments.
- Moved the install script to `scripts/install.sh`, now copying the binary instead of symlinking.
- Revised README to reflect new installation instructions and optional plist editing.
Add tests for all 34 sentinel pairs (up from 12), plus negative cases,
~/Library skip path, mixed dependency types, nested project handling,
and find -prune verification. Split monolithic asimov.bats into
sentinels.bats and behavior.bats for better organization.
Add exclusion patterns for modern development tooling that has emerged
since the last update, covering JS frameworks, Clojure, Zig, OCaml, Elm,
Godot, R, direnv, and additional Python/Elixir/Terraform variants.

New patterns: .next, .nuxt, .angular, .svelte-kit, .turbo, .yarn,
target/project.clj, target/deps.edn, .cpcache, .shadow-cljs,
venv/pyproject.toml, __pypackages__, _build/mix.exs, .terraform,
.direnv, _build/dune-project, .zig-cache, zig-out, elm-stuff,
.godot, renv.
… badges

Tighten the intro, add a supported ecosystems table covering all 30+
patterns, add a quick start section, update badges with logos, fix the
Isaac Asimov typo, and streamline installation/contributing sections.
Add a comprehensive contributing guide covering setup, adding new
dependency patterns, commit conventions, and project structure.
Update copyright year to 2017-2026.
…upport

Add glob pattern support for sentinel definitions so wildcards like
*.xcodeproj can be used. Sentinels containing '*' use sh -c with ls -d
for glob expansion instead of test -e, with no performance impact.

Add DerivedData *.xcodeproj entry to exclude Xcode build artifacts when
an Xcode project is present alongside the DerivedData directory.

Inspired by stevegrunwell#64 (props @mdab121).
…light (#10)

Add a note explaining that asimov does not hide directories from
Spotlight indexing, with guidance on how to configure Spotlight privacy
settings separately.

Addresses stevegrunwell#90.
Exclude bin/ and obj/ directories when *.csproj (C#) or *.fsproj (F#)
project files are present, using glob sentinel patterns.

Inspired by stevegrunwell#87, props @guigomesa.
* docs(readme): clarify that asimov only affects Time Machine, not Spotlight

Add a note explaining that asimov does not hide directories from
Spotlight indexing, with guidance on how to configure Spotlight privacy
settings separately.

Addresses stevegrunwell#90.

* feat(sentinels): add .NET project build directory exclusions

Exclude bin/ and obj/ directories when *.csproj (C#) or *.fsproj (F#)
project files are present, using glob sentinel patterns.

Inspired by stevegrunwell#87, props @guigomesa.

* feat: skip already-excluded directories for faster subsequent runs

Use Spotlight metadata (mdfind) to identify directories already excluded
from Time Machine and skip them during the find traversal. Also fixes a
comment typo and removes duplicate Gradle sentinel entries.

Inspired by stevegrunwell#97, props @VladRassokhin.

* feat: exclude well-known global cache directories

Add fixed directory exclusions for common tool caches (~/.cache,
~/.gradle/caches, ~/.m2/repository, ~/.npm/_cacache, ~/.nuget/packages,
~/.kube/cache) that are always safe to exclude without sentinel files.

Inspired by stevegrunwell#69, props @pkuczynski.
When asimov runs as root (via brew services or sudo), ~ expands to
/var/root. Now detects the console user via stat/dscl and uses their
home directory instead.

Addresses stevegrunwell#72.
Show the total count and combined size of newly excluded directories
after each run, making it easy to see the impact at a glance.

Inspired by stevegrunwell#84, props @Vadorequest.
When tmutil addexclusion fails (e.g. Error -20 or -50 on paths inside
app bundles or with permission issues), skip the path with a warning
instead of crashing. This allows asimov to continue processing
remaining directories.

Addresses stevegrunwell#101 and stevegrunwell#86.
Added .cursor, .idea, and .vscode to the .gitignore file to prevent these IDE-specific and cursor-related files from being tracked in the repository.
Changed the GitHub repository links in CONTRIBUTING.md and README.md from django23 to stevegrunwell to reflect the new repository ownership.
Introduced a new --dry-run flag that allows users to see which directories would be excluded from Time Machine backups without actually modifying any settings. This feature enhances usability by providing a preview of actions before execution. Updated relevant functions and tests to support this functionality.

Also, refactored the handling of ASIMOV_ROOT to ensure correct path resolution when running as root, and made adjustments to the exclusion logic for improved performance.
Introduced `--help` and `--version` flags to the asimov script, enhancing user experience by providing usage information and version details. Updated the command-line argument parsing to handle unknown options gracefully, displaying an error message and usage instructions. Refactored the exclusion summary logic to improve clarity and maintainability.

Also, updated documentation and tests to reflect these new features.
Bumped the version to 0.4.0 and added a new function, `record_excluded_path()`, to streamline the logging of excluded paths and their sizes. Implemented validation to ensure `ASIMOV_ROOT` exists before execution, improving error handling. Updated tests to cover scenarios with spaces in project paths and the new root directory validation. Enhanced documentation to reflect these changes.
Dropped macOS 13 from the CI test matrix as it has been retired. This change streamlines the testing process and ensures compatibility with supported macOS versions.
django23 added 30 commits March 2, 2026 18:15
…le printf

Restructure to use early-return style for the zero-count case and a
single printf at the end, removing the duplicated format string.
ASIMOV_DRY_RUN is unconditionally declared at the top of the script, so
the :- default-value guard in exclude_paths_from_stdin() is not needed.
Prevents noise in syslog when run by launchd. Interactive use unchanged.
Default output now only shows newly excluded paths, warnings, and the
summary. The 'already excluded, skipping' messages (which dominate output
on subsequent runs) require --verbose. This reduces log noise when run
by launchd as a daily scheduled job.
Covers all 15 planned improvements: distribution (Homebrew tap, curl
installer, plist rename), performance (remove tmutil isexcluded
redundancy), README rewrite, new flags (--quiet), config file support,
unit tests, project hygiene, and migration docs.
13 tasks covering distribution, performance, config file support,
--quiet flag, tests, README rewrite, migration docs, and project
hygiene. Plus post-plan Homebrew tap instructions.
Avoids conflicts when both the original and this fork are installed.
The old label should be unloaded first — see UPGRADING.md.
The mdfind optimization already prunes excluded paths from the find
command. The per-path isexcluded check is now --verbose-only, cutting
~30-50% of subprocess spawns per run. Mock tmutil updated to be
idempotent (matching real tmutil behavior).
Useful for launchd/cron where only errors matter. Mutually exclusive
with --verbose. Verbosity spectrum: --quiet < (default) < --verbose.
Locks down integer-truncation behavior at KB/MB/GB boundaries.
Tests extracted function via awk to avoid executing the full script.
Users can now enable/disable fixed dirs, add extra fixed dirs,
add extra sentinel pairs, and disable built-in sentinels via an
INI-style config file. Fixed dirs default to off (breaking change
from v0.4.x). See UPGRADING.md for migration instructions.
Installs to ~/.local/bin (no sudo), sets up daily launchd schedule,
warns if ~/.local/bin is not in PATH.
Concise structure with install-first layout, collapsed ecosystem table,
config file docs, upgrade notice, and credits to original author.
Covers migration from v0.4.x (this fork) and from the original
stevegrunwell/asimov.
v0.5.0 is a breaking release: fixed dirs default off, plist label
renamed, config file support, --quiet flag, performance improvements,
and new distribution channels.
macOS ships bash 3.2 where "${empty_array[@]}" with set -u triggers
"unbound variable". Guard ASIMOV_CONFIG_DISABLED_SENTINELS and
ASIMOV_CONFIG_EXTRA_SENTINELS iterations so the script works on
both system bash (3.2) and Homebrew bash (5.x).
…entation

Updated the asimov script to accept a directory argument for scanning, defaulting to the home directory if none is provided. Enhanced usage output to reflect this change and added error handling for non-existent directories. Introduced a new 'bench' target in the Makefile for timing dry-run scans. Added tests to verify the new functionality.
…erformance

Introduced a new --stats option to the asimov script, allowing users to view per-directory sizes and a total space summary when excluding directories. This change optimizes performance by computing sizes only when requested, rather than by default. Updated usage documentation and tests to reflect these enhancements.
Enhanced the asimov script to utilize a cache for already-excluded paths, significantly improving performance by reducing the overhead of per-path pruning in find operations. This change eliminates O(directories × excluded_paths) complexity, resulting in speed improvements of up to 37%. Updated tests to reflect the new caching behavior and adjusted documentation accordingly.
Added a persistent path cache to store discovered dependency paths, significantly reducing runtime for subsequent runs. Introduced `--full-scan` and `--no-cache` flags to control scanning behavior. Enhanced the usage documentation to reflect these changes and updated tests to ensure proper functionality of the new caching mechanism and incremental discovery via Spotlight.
…caching

Introduced a persistent failed-path state to track paths where `tmutil addexclusion` fails, allowing these paths to be skipped on subsequent runs. Added a mdfind-seen cache to remember all candidates checked by Spotlight, preventing redundant checks for directories without sentinels. Updated tests to validate the new functionality and ensure proper handling of failed paths and mdfind candidates.
Updated ASIMOV_VERSION to 0.6.0 to reflect the latest changes and improvements in the project. Ensure synchronization with CHANGELOG and package managers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant