Lotman V2#3517
Draft
bbockelm wants to merge 28 commits into
Draft
Conversation
Begin migrating the lotman storage-lot capability off the external C library (libLotMan.so / purego) into a native, standalone-friendly Go package. This first increment lays the foundation: - lotman/core: new package depending only on gorm + goose + stdlib, with no Pelican imports (enforced by an AST-based boundary test) so it can be promoted to its own repository later. - GORM models for lots, parents, paths, usage, parent attributions, and reclamations; owner and management-policy attributes folded into the lots table; all timestamps in Unix milliseconds. - Embedded Goose migrations defining the authoritative schema with foreign-key cascade, applied via Manager.Migrate() and tracked in a private lotman_goose_db_version table so the schema coexists in the shared Pelican SQLite database without colliding with other components. - Manager.New/Migrate with injectable Options (strict hierarchy, contraction policy, admin override, clock, logger). Tests cover nil-DB rejection, migration creation + idempotency, option defaults, and the dependency boundary. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements the lot management operations on top of the core data model: - Create/read/update/delete: AddLot (lot + parents + paths + zeroed usage in one transaction), GetLot, UpdateLot (owner and/or MPA), AddToLot, RemoveLot (reparents direct children to the lot's parents) and RemoveLotRecursive (cascades to the whole subtree), RemoveParents (lot must retain >=1 parent), RemovePaths. - Hierarchy traversal: LotExists, ListAllLots, IsRoot, and GetParents / GetChildren / GetOwners with optional recursion, cycle-safe against self-parent (root) edges. - Validation: management-policy sentinel rules (-1 == unbounded; an unbounded dedicated quota requires an unbounded opportunistic quota) and lifecycle windows (all-zero == non-expiring, otherwise all-positive and ordered). - Ownership authorization: creating a child requires owning a parent; modifying requires owning the lot or a parent. An empty caller denotes a trusted/system call (used to bootstrap root/default lots). Tests cover the happy paths, validation failures, duplicate detection, recursive traversal, authorization, parent-retention rollback, child reparenting, and cascade deletion. Also drops a stray doc reference from the package comment. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements longest-prefix path resolution, ported faithfully from the reference SQL behavior: - LotsFromDir: resolve the lot owning a path at an instant. Honors the exclusion-override rule (a longer covering exclusion on the same lot suppresses its inclusion), the active lifecycle window, and reclamation; appends ancestors when recursive. Includes the attribution fallback that ignores the active window (preferring the generation created at or before the query time and closest to it) so bytes are not stranded on the default lot during a generation-rotation gap. Unmatched paths resolve to "default". - LotsForPath: the windowed variant returning every lot owning a path at any instant in [lo, hi), via a sweep-line that lets a lot win when some moment of its active interval is not shadowed by a strictly longer-claim lot, with mid-window reclamation clipping and a default-lot gap fallback. - Paths are normalized (absolute, cleaned, no trailing slash) on store and in resolution, and only the depth-bounded set of ancestor prefixes is loaded. Note: a non-recursive path matches only its exact path, not its children (matching the reference behavior); object trees should be owned by recursive paths. Tests cover longest-prefix selection, exclusion carve-outs, active-window and attribution fallback, and the windowed union with default-gap detection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements per-lot usage tracking with the reference's recompute-based rollup: - UpdateLotUsage: set (absolute) or adjust (delta) a lot's self_gb, self_objects, self_gb_being_written, and self_objects_being_written, rejecting any update that would store a negative value. After applying, the children rollup is recomputed for the lot's ancestors so the result is immediately consistent. - UpdateLotUsageByDir: resolve each path to its owning lot using attribution semantics, aggregate usage per lot, and apply it. - GetLotUsage: report self, children, and total usage across all four axes. - RecalculateChildrenUsage: full recompute over every lot, for use after a batch of updates or to repair drift. The children rollup sums descendants' self_* values but excludes any lot with a reclamation row, so reclaimed generations no longer inflate their ancestors' totals. Ancestor/descendant traversal is cycle-safe and runs within the updating transaction. Tests cover multi-level rollup, delta accumulation with negative-result rejection and rollback, absolute negative rejection, exclusion of reclaimed descendants, and path-resolved by-dir updates. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements the parent/child reservation invariants on top of the lot model:
- Parent attributions: a child's MPA on each axis (dedicated_GB,
opportunistic_GB, max_num_objects) is distributed across its non-self
parents as fractions -- explicit per-parent values where given, the
remainder split equally, an unbounded child propagated as 1.0 to every
parent -- with shortfall/overage rejection.
- The three hierarchy axioms, enforced atomically on every mutation that can
break them (rolling back on violation):
* Axiom 1: each parent's attributed share may not exceed the parent's MPA
on any axis; an unbounded child may not sit under a bounded parent.
* Axiom 2: a sweep-line over children's active windows ensures the peak
concurrent attributed allocation never exceeds the parent's MPA, so
non-overlapping reservations can reuse the same capacity.
* Axiom 3: a child's lifecycle window must lie within every parent's.
- AvailableCapacity: advisory remaining capacity under a parent over a time
window, via the same sweep-line (nil for unbounded axes).
- PolicyAttributes: the most restrictive value for each requested attribute
across a lot and its ancestors, and which lot imposes it.
Enforcement is gated on the strict-hierarchy option; attributions are always
stored so capacity queries work regardless. PolicyAttributes compares values
numerically, so the unbounded sentinel (-1) is reported as most restrictive
and should be interpreted as unbounded by callers.
Tests cover each axiom's rejection path and the non-overlapping-allowed case,
explicit multi-parent attribution splits checked via AvailableCapacity,
attribution overage rejection, unbounded-axis capacity, and restrictive
attribute resolution.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements the queries the cache eviction loop uses to decide what to purge, and the reclamation ledger: - LotsPastExp / LotsPastDel: lots whose expiration / deletion time has passed at a given instant (non-expiring lots never qualify), with optional recursive expansion to descendants and a reclamation filter evaluated at that instant. - LotsPastDed / LotsPastOpp / LotsPastObj: lots over their dedicated, dedicated+opportunistic, or object-count quota. The non-hierarchical form compares self (or self+children) usage to the threshold and skips axes marked unbounded (-1). The hierarchical form computes an adjusted usage that adds each child's capped overage to its parent (excluding unbounded and reclaimed children, and unbounded/reclaimed parents) and returns results deepest-first. - ReclaimLot: records that a lot and its descendants have been reclaimed, as an append-only ledger (existing rows are never overwritten), returning whether any new row was added or every target was already reclaimed. The default lot cannot be reclaimed. Ancestors' children rollups are recomputed so reclaimed usage immediately stops counting. Tests cover the time-based boundaries and non-expiring exclusion, dedicated quota with and without the children rollup, hierarchical overage attribution and deepest-first ordering, the object cap with unbounded-axis skipping, the reclaim cascade with idempotency and rollup drop, the default-lot guard, and the exclusion of reclaimed lots from quota results. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ests
Completes the native core with parity-focused validation fixes and
invariant coverage:
- Fix two validation divergences found by reading the reference source:
* Management-policy attributes now reject any negative value that is not
exactly -1 (previously only values strictly below -1 were rejected, so
e.g. -0.5 slipped through).
* Lifecycle timestamps now require a strict creation < expiration (a
non-empty half-open interval) and permit non-zero negative values,
matching the reference exactly.
- Add sentinel/invariant tests: the valid/invalid MPA combinations, the
timestamp rules (strict-less, partial-zero rejection, negatives allowed),
unbounded child under unbounded parent allowed, a fully-unbounded lot never
appearing in any past-quota query, a zero-dedicated lot always past its
dedicated quota (the catch-all eviction shape), and a reclaimed lot dropping
out of path resolution.
The core now has 47 passing tests and depends only on gorm, goose, and the
standard library.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nt GB Switch the lotman core's storage accounting from floating-point GB to int64 bytes throughout, making all quota, rollup, sweep-line, and axiom math exact integer arithmetic. - Schema: dedicated_gb/opportunistic_gb and the self_/children_ usage columns become *_bytes INTEGER; parent attributions store an absolute attributed_value INTEGER (with -1 = unbounded) instead of a fraction REAL, removing the last fraction-times-bytes multiplication. - Types: MPA, usage updates/reports, available-capacity, and restrictive policy values are now int64; the storage axes are named *Bytes. - Logic: dropped the 1e-9 comparison tolerances and rounding; over-quota and capacity checks are exact. int64 (~9.2 EB) is far beyond any cache, and sums are bounded by physical storage, so there is no overflow risk. Conversion to/from GB is deferred to the external edges (REST API, config, and the C ABI), and the bytes-native cache will feed usage in directly. All core tests are updated and passing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The lotman core stores storage quantities as int64 bytes, while the adapter, REST API, and PolicyDefinitions config speak GB as float64. Add sentinel-aware conversion helpers at that boundary, using the existing decimal-GB factor (1e9): gbToBytes/bytesToGB plus pointer variants and an Int64FromFloat helper. The unbounded sentinel is preserved across the unit change (-1 GB maps to -1 bytes, not -1e9), and a nil GB pointer maps to 0 (defaults are applied upstream). Tested for round-trips, the sentinel, and nil handling. This is the first step of moving the lotman adapter off the libLotMan.so purego binding onto the native core; the wrapper bodies follow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Introduce the process-wide core.Manager holder (getManager/setManager) and a logrus-backed core.Logger adapter, plus the input mapping layer that converts the adapter's GB-based public types into the core's byte-based specs: lotToSpec, mpaToCore, parentAttrToCore, and the gbPtr->bytesPtr helper. The unbounded sentinel is preserved across the unit change and unset object counts/timestamps map to sensible defaults (unbounded / non-expiring). These are additive and unwired; the lotman package still builds with the libLotMan.so binding in place. The wrapper bodies and InitLotman switch over to this manager next. Tested: type mapping (including 1.11 GB -> 1.11e9 bytes and the -1 sentinel), nil-MPA defaults, and the manager holder. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cache V2 will account usage per owning lot instead of per first-path-component namespace. Add the hot-path resolver: an in-memory longest-prefix index that maps an object path to its owning lot (and a stable LotID), mirroring the lotman core's point-in-time resolution rules -- exact-match or recursive- ancestor coverage, a longer covering exclusion on the same lot suppresses a shorter inclusion, the longest surviving inclusion wins, and unmatched paths fall to the default lot. The index is rebuilt from the lotman core when lots change, so object ingest never queries the lot database. Tested for longest-prefix selection, exclusion carve-outs, non-recursive exact-only matching, default fallback, stable/preserved ids across rebuilds, and building the index from a live core manager. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the result-side mappers that convert core query results (bytes) back into the adapter's GB-based public types: lotViewToAdapter, coreMPAToAdapter, usageRowToLotUsage (with splitStorage deriving the dedicated/opportunistic usage breakdown from total usage vs the lot's MPA), restrictiveToAdapter, and capacityToAdapter (an unbounded axis reports 0, matching the prior null-decodes-to-zero behavior). Together with the input mappers this completes the GB<->bytes mapping layer between the adapter and the native core. Still additive and unwired; the package builds with the libLotMan.so binding in place. Tested: the storage split (including unbounded axes) and capacity nil-handling. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The persistent cache can serve multiple federations, and object identity is federation-aware (the pelican:// host is part of the object hash), but the legacy namespace bucket is path-only and collapses federations together. Make lot resolution federation-qualified so the same path in two federations maps to two different lots: the resolution key prefixes the object path with its federation discovery host (/osg-htc.org/atlas/file), and lots are stored with matching federation-qualified paths. The lot core stays purely path-based -- federation is simply the top segment of the path namespace. Adds federationQualifiedKey() plus tests for key derivation and federation isolation (two federations sharing an /atlas prefix resolve to distinct lots). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wire the in-memory lot index into the persistent cache so every cached object is attributed to its owning storage lot. CacheMetadata gains a LotID field (map-keyed msgpack, backward compatible); PersistentCache accepts an optional lotman core manager, builds the longest-prefix lot index from it, and resolves each object's federation-qualified path to a LotID at ingest, recording it in the object's metadata at all ingest paths (stat-init, no-store, disk- and inline-finalize). A RebuildLotIndex hook refreshes the index when lots change. Lot tracking is opt-in: with no manager configured, getLotID returns 0 and cache behavior is unchanged. The manager's lifecycle and lot bootstrap are owned elsewhere; the cache only consumes it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When lot tracking is enabled, attribute cache usage to the owning storage lot instead of the first-path-component namespace, reusing the existing per-bucket accounting rather than introducing a parallel key space. getNamespaceID now resolves an object to its lot name (longest-prefix over federation-qualified lot paths) and maps that name through the existing, persisted namespace-mapping table to a stable bucket id; the BadgerDB usage (u:) and LRU (l:) key formats are unchanged, so all usage/eviction machinery works as-is and ids survive restarts. When lot tracking is disabled the bucket remains the legacy namespace prefix, preserving namespace fairness, so a given cache instance buckets either entirely by lot or entirely by namespace. The lot index is reduced to pure path->lot-name resolution (the cache owns id assignment); lotIDOf derives an object's LotID from its bucket; meta.LotID is recorded at every ingest site. No cache schema-version bump is required; on first enable, pre-existing namespace-bucket counters simply age out via LRU. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the reconciliation that feeds the lotman core the cache's actual occupancy so quota and eviction-priority queries are meaningful. syncLotUsage reads the per-(StorageID, bucket) byte counters, maps each accounting bucket back to its lot, sums across storage directories, and writes the absolute self-byte usage for every lot the core knows about -- so a lot with no cached bytes is reset to zero -- letting the core recompute parent/child rollups. A background loop runs it on a fixed interval (default one minute), wired into the persistent cache constructor and a no-op when lot tracking is disabled; the routine is also safe to invoke on demand before an eviction pass. Object-count usage is not yet synced (the cache tracks bytes per bucket only); that will accompany the object-count-capped monitoring lot. Tested with a pure aggregation unit test and an end-to-end test that writes usage into a real cache database and verifies the resulting per-lot self, children rollups, cross-directory aggregation, and reset-to-zero on drain. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When lot tracking is enabled, eviction now frees over-quota and expired lots first instead of relying solely on the greediest bucket. An optional eviction planner (implemented by the persistent cache) is consulted by checkAndEvict: before a pass that will actually evict, it syncs current usage into the lot store, then Tier 1 drains lots in priority order -- past deletion time, past expiration, over dedicated+opportunistic, then over dedicated -- restricted to lots present in the directory being relieved; Tier 2 falls back to the existing greediest-bucket loop for any remainder. With no planner installed (lot tracking disabled) eviction behaves exactly as before. Tested for priority ordering, per-directory filtering, and the no-manager case, with the existing eviction suite still passing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add object-count cap enforcement, the mechanism behind the monitoring lot's bounded object count. CountLRUEntries reports the exact object count for a (storage, bucket) pair via a bounded keys-only scan of the LRU index -- each cached object has exactly one LRU entry regardless of chunking, so no hot-path counter is needed. trimObjectCaps periodically (default one minute), and independent of disk pressure, brings every lot with a finite max_num_objects back to its cap by counting its bucket across storage directories (including inline) and evicting the oldest excess; startObjectCapTrim runs it and is wired into the cache constructor as a no-op when lot tracking is disabled. Enforcement is cache-side (using the lot's cap from the core plus live LRU counts); populating the core's object usage and an end-to-end over-cap eviction test (which needs the storage-manager harness) are follow-ups. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the libLotMan.so / purego FFI binding with the in-tree native engine for all lot operations. The adapter wrappers now call the core manager directly; the engine self-migrates its schema and selects its database by cache mode (a dedicated, daemon-shareable SQLite for the XRootD cache; the shared server database otherwise). Drop the platform build tags now that the engine is pure Go: the package compiles on linux, darwin, windows, and ppc64le. The single OS-specific call (filesystem free-space probing) is split into unix and non-unix files. Remove the ebitengine/purego dependency. Store an unset max_num_objects as 0 rather than letting it fall through to the column default, matching the dedicated/opportunistic byte axes: 0 means "no quota" and the unbounded sentinel (-1) is always set explicitly. The pelican-server binary remains CGO-free and statically linkable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Counting a lot's objects by scanning the LRU index on every trim is O(objects) and far too heavy at millions of objects. Instead, piggyback object counting on the periodic metadata scan, which already walks every metadata entry to reconcile per-bucket byte usage: accumulate per-(StorageID, bucket) object counts in the same pass and reconcile them into a cheap stored counter (oc: keys). The object-cap trim now reads that counter (O(buckets)) and decrements it after evicting so a re-run before the next scan does not over-evict; per-lot object counts are also pushed to the lotman core so its object usage and over-object-cap queries are populated. Counts are as fresh as the last metadata scan, so a lot may briefly exceed its cap between scans before the trim brings it back -- an acceptable approximation for a rolling-window cap, and the cost of avoiding an O(objects) scan. Removes the previous CountLRUEntries LRU-scan approach. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the remaining configuration surface for native lot management on the persistent (V2) cache and complete object-cap enforcement: - Cache.LotUsageReconcileInterval (duration, default 1m): how often the cache pushes per-lot usage into the lot manager and trims lots over their object-count cap. Plumbed through PersistentCacheConfig and the cache launcher; the maintenance loops use it instead of a hardcoded interval. - Lotman.MonitoringLotMaxObjects (int, default 500): the V2 cache now auto-creates a non-expiring "monitoring" lot owning /pelican/monitoring and bounds its object count as a rolling window enforced by the cache's trim loop. The lot has no dedicated bytes, so it is also a first eviction target under disk pressure. Not created on the V1 (xrootd) cache, which has no federation-aware monitoring lot. - Lotman.AutoCreateOnDiscover (bool, default true): gate auto-creation of lots for newly-advertised federation prefixes in the renewal routine. Set false at strict-reservations sites; uncovered prefixes fall to the default lot. - Lotman.DefaultLotOpportunisticGB (int, default 0): opportunistic quota granted to the catch-all default lot. 0 preserves the historical behavior (unlotted data reclaimed first); a positive value (or -1 for unbounded) lets the cache retain unlotted data opportunistically. - Deprecate Lotman.LibLocation: the native in-process engine loads no shared library, so the value is accepted but ignored. Tests cover the over-cap eviction path end-to-end against a real storage manager, monitoring-lot and default-lot quota propagation, and the auto-create gate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ngine Package the lot engine as a drop-in libLotMan shared library so external C consumers (notably the XRootD pfc purge plugin) keep working without the original C++ library, while pelican/pelican-server stay CGO-free. - lotman/lotjson: extract the lot JSON wire schema and its GB<->bytes conversions (which depend only on lotman/core) out of the Pelican-coupled adapter into a dependency-light package. The adapter keeps its existing type/function names via aliases, so call sites are unchanged. - lotman/cshared: a cgo c-shared front end re-exposing the historical libLotMan C ABI over the native core.Manager. It opens its lot database from the "lot_home" context key (the same SQLite file the V1 cache uses) and depends only on lotman/core and lotman/lotjson. - Makefile: `lotman-shared` builds the library natively; `lotman-package` produces its RPM/DEB/APK via a separate goreleaser config. The main pelican/pelican-server builds remain CGO-free. - Guardrail test asserting pelican-server embeds the native lot engine and never links the cgo library or the old purego binding. - Document cache storage lots on the cache operator page. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Stand up a full federation with a LotMan-enabled V2 cache, cache objects under a namespace, and assert via the lots REST API that the namespace lot's usage is tracked and reported. Exercises object->lot resolution, the usage reconciler, API auth, and the endpoints together. Eviction mechanics remain covered by the deterministic local_cache integration tests (the metadata-scan cadence makes them impractical to assert end-to-end). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a client command set under the `lot` noun (list/get/create/update/delete/ reclaim/usage) that drives the cache's lot REST API, mirroring the downtime admin CLI (shared --server/--token flags, admin-token auth, YAML/--json output). Tests: - cmd/lot_test.go: URL construction, flag->MPAInput mapping, and each verb against a TLS mock using a config-generated CA. - lotman/lot_cli_v2_e2e_test.go: builds the pelican binary once (lazily) and drives the CLI with an admin token against a live federation whose V2 cache has lotman enabled, exercising the full CRUD cycle and proving cached bytes are accounted to a lot and reported via `pelican lot usage`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The lot API authorized admins (admin login cookie) but then passed the federation issuer as the lotman caller, so the core's owner/ancestor check still rejected operations on lots the federation issuer does not own (e.g. creating a lot under the root lot). Admins are meant to override ownership. Route admin requests through the core's existing trusted/system caller (the empty caller, which bypasses the ownership check) via a new authResult.lotmanCaller() helper used by the create/update/delete/reclaim handlers. The admin's federation issuer is still used to derive a new lot's owner; only the ownership check is overridden. Non-admin requests are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SQLite databases use an unbounded connection pool, and GORM transactions
default to BEGIN DEFERRED: they take a read lock and only upgrade to a write
lock on the first write. Two pooled connections each doing read-then-write
concurrently deadlock on that upgrade and return SQLITE_BUSY ("database is
locked") immediately -- busy_timeout cannot resolve a mutual upgrade, so it
never applies. This surfaced as intermittent write failures (cache lot-usage
reconcile, key renewal, lot API writes) under concurrency.
Append _txlock=immediate to the shared SQLite DSN so every explicit
transaction takes the write lock up front and serializes through busy_timeout
instead of deadlocking. WAL and busy_timeout were already applied correctly;
this fixes the deferred read->write upgrade, not a missing pragma. Autocommit
reads and ReadOnly transactions are unaffected and stay concurrent under WAL.
Add WriteTx/ReadTx helpers in database/utils to make lock intent explicit:
WriteTx is the default write path; ReadTx marks a multi-statement read
ReadOnly so it stays concurrent. Migrate the GORM write transactions in
database, registry, and oauth2/issuer to WriteTx. (lotman/core keeps
db.Transaction directly because it must not import pelican packages;
client_agent/store uses raw database/sql. Both still get the DSN fix.)
Add regression tests: concurrent read-then-write transactions complete with
no SQLITE_BUSY and no lost updates (reproduces ~465/480 failures without the
fix, 0 with it), plus DSN-content and WriteTx/ReadTx behavior checks.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The XRootD pfc purge plugin (libXrdPurgeLotMan) links libLotMan at a fixed C ABI. lotman v0.1.0 added query_time / include_reclaimed / hierarchical parameters to the eviction queries and update_lot_usage_by_dir; a plugin built against the older v0.0.4 ABI passes fewer arguments, so a new-ABI libLotMan reads shifted registers and segfaults while converting the returned lot list during purge. Build libLotMan at whichever ABI matches the deployed plugin: - Split the version-sensitive exports into export_api_v1.go (current ABI, the default) and export_api_legacy.go (old v0.0.4 signatures), selected by the lotman_legacy_api build tag. Shared JSON parsing moves to parseDirUsage in export.go. - Makefile lotman-shared resolves LOTMAN_ABI (auto|new|legacy, default auto): auto sniffs the installed lotman header (new iff update_lot_usage_by_dir takes an int64_t query_time) and adds -tags lotman_legacy_api when the old ABI is in use; override with LOTMAN_ABI=new|legacy. Verified: a legacy-ABI libLotMan plus the stock old purge plugin runs the V1 cache eviction path with no crash and real lot-based eviction. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- e2e_fed_tests: a V2 persistent-cache federation test that, with LotMan enabled, downloads past the high watermark and asserts the watermark eviction loop reclaims on-disk object bytes back toward the low watermark (8 MiB downloaded settles to ~2.6 MiB). Exercises the full pipeline: download -> object/lot accounting -> eviction. - local_cache: a deterministic selectivity test that drives the real EvictionManager.checkAndEvict with the lot planner installed and proves the over-quota lot's objects are evicted while a protected (under-quota) lot's objects are spared. Sized so tier-1 (lot priority) alone reaches the low watermark, so the lot-agnostic greediest-bucket fallback never runs. - lotman: the V1 XRootD cache e2e builds the C ABI matching the installed plugin and skips cleanly without root / xrootd / the purge plugin. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| if dirUsage <= 0 || uint64(dirUsage) <= limits.lowWater { | ||
| break | ||
| } | ||
| overhead := dirUsage - int64(limits.lowWater) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This started as an exploration of how we could integrate Lotman into "Cache V2" but stumbled into a relatively clever idea: flip the Go / C++ relationship upside down in the lotman library.
The core of this is a machine translation of the Lotman C++ code into Golang (and a few cleanups where I couldn't help myself: the SQLite tables aren't ported over identically but the general concepts). That exposes a standalone shared library that
xrootd-lotmancan link against and a Golang-based module that can be used directly inpelican-server. The latter results in the opportunity to completely drop thepuregobindings and returnpelican-serverinto a standalone binary.Because it was cheap, this also starts building out a CLI and integration tests for Lotman itself: there's one end-to-end test that tries to put all this together.
Pretty nifty stuff!