Add remote transfer server functionality#3509
Draft
bbockelm wants to merge 22 commits into
Draft
Conversation
Record which source (Authorization header, login cookie, or authz query parameter) a verified token arrived from on the VerifyResult, so downstream handlers can apply source-specific policy. Add scope-parsing helpers (including hierarchical resource-scope matching), the pelican.transfer token scope (also accepted as a standard scope by the embedded issuer), and the TransferType server role. These leaf-level additions are depended upon by the config and transfer changes that follow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Refactor the client credential/token configuration so it is no longer OSDF-specific: rename OSDFConfig to CredentialConfig (keeping OSDFConfig as a deprecated alias) and key OAuth client and transfer-server entries by federation discovery URL via GetFederationCredentials / FindOauthClient / FindTransferServer. PrefixEntry now embeds a reusable ClientRegistration struct, and token generation accepts an explicit DiscoveryURL. Make token acquisition pluggable: add a TokenProvider interface plus a StaticTokenProvider, allow a tokenGenerator to delegate to an external provider (SetExternalProvider), and expose WithTokenProvider / WithSourceTokenProvider transfer options. The PrefixEntry/ClientRegistration restructuring and its construction sites are mutually coupled, so the config and token-acquisition changes land together. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add the transfer server: an authenticated HTTP API over the client_agent transfer engine, with per-user job ownership, an encrypted credential store, a per-issuer OAuth2 client registry, and OAuth2 credential bootstrap flows (RFC 8693 token exchange and authorization code). Includes the database migration, configuration parameters (Transfer.* and Origin.EnableTransferAPI), launcher wiring for both standalone and origin-embedded deployment, the generated Swagger paths, and unit tests. The end-to-end TPC test is added separately in the following commit. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add an end-to-end test that stands up a federation and exercises a third-party copy driven through the transfer server, covering the origin-pull and direct-credential paths. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add the `pelican transfer` command group for managing a transfer server's resources: credential and OAuth2-client management, and credential bootstrapping. The CLI authenticates to the transfer server via dynamic client registration plus the device-code flow, and drives the server's token-exchange and authorization-code bootstrap endpoints. Includes the generated command-reference documentation. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a `pelican object copy` subcommand that drives client.DoCopy to perform a server-to-server (third-party) copy between two federation URLs. It reuses the transfer CLI's credential lookup/bootstrap helpers, so it follows the transfer CLI in the series. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add the transfer server pages to the web frontend (credentials, jobs, and landing pages plus navigation) and register "transfer" as a server UI page in the Go web server. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mdformat (CommonMark, no table plugin) collapses GFM tables to a single line on commit. Convert the design doc's tables to bullet lists so they render and survive the formatter. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a WalletSession to the client agent that governs access to the user's encrypted credential file. The agent never prompts for a password; the password is supplied over the (user-owned unix socket) wallet API and cached via the config keyring, so the agent can decrypt and rewrite the wallet without a TTY. Adds config.ForgetPassword() to clear the cached password (locking the wallet without restarting), and three endpoints under /api/v1.0/transfer-agent/wallet: status, open, close. The wallet is locked on server shutdown. This is the foundation for agent-side credential use and background token refresh; the CLI is responsible for warming the wallet (acquiring the right tokens) before submitting a job. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a NonInteractive mode to token acquisition: TokenGenerationOpts.NonInteractive and a WithNonInteractive transfer option. When set, AcquireToken still uses cached, refreshable, or locally-generatable tokens but fails with an actionable error instead of falling back to the interactive OAuth2 device-code flow. Wire it into the client agent: transfers always run non-interactively (the agent has no TTY), and when the wallet is open the agent enables OAuth token acquisition so the client library selects/refreshes the right per-destination token from the user's wallet. When the wallet is locked, acquisition is off so the daemon never blocks on a decrypt/device-code prompt (explicit tokens and environment discovery still work). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add proactive, non-interactive refresh of stored OAuth2 tokens nearing expiry so queued jobs always have a usable token by the time they run. - Extract a reusable refreshTokenEntry helper from AcquireToken (refresh-token grant for a single stored token; no behavior change to AcquireToken). - Add client.RefreshExpiringCredentials: scans the wallet for refreshable tokens within a window, resolves each credential's issuer (directly when the prefix is an issuer URL, otherwise via a Director lookup for the namespace), refreshes, and writes the wallet back once. - Run it from the client agent every 10 minutes (window 30 minutes) while the wallet is open, serialized so cycles do not overlap. Note: the per-transfer client path still lazily refreshes *expired* tokens. The 30-minute window keeps the two paths from typically acting on the same credential at once; on a wallet-write race the worst case is a lost refresh-token rotation (re-acquire needed). File-level locking of the wallet is left as a follow-up if needed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add WalletStatus, OpenWallet, and CloseWallet to the agent API client so a submitting client can unlock the agent's credential wallet (and check/lock it) before submitting jobs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add WithCredentialFileLock (an in-process mutex plus a best-effort advisory file lock via gofrs/flock) and UpsertPrefixEntry, which re-reads the wallet under the lock and replaces a single prefix's OAuth client entry before writing. This prevents lost updates when several read-modify-write cycles run concurrently (notably the agent's background refresh vs. per-transfer lazy refresh), while preserving concurrent changes to other prefixes. Route AcquireToken's credential saves and the background refresh through the locked per-prefix upsert. The lock is held only around the brief write (not network calls); writes remain atomic (temp-file-and-rename), so if the advisory lock is unavailable the code proceeds optimistically. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
When an object get/put/copy/prestage is run with --async, the (interactive) CLI now warms the user's wallet with the tokens the transfer needs and opens the client agent's wallet before submitting, so the agent — which has no terminal — can authorize and refresh the transfer non-interactively. Sources are warmed with a read credential and destinations with a write credential; public namespaces and jobs given an explicit --token are skipped. Also add `pelican client-agent warm <url>...` to acquire credentials and unlock the agent's wallet on demand (with --write for upload scopes). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The embedded issuer writes the expiry columns of its token, device-code, and JWT-assertion tables with the local clock (time.Now().Add(...)), but the garbage-collection cutoffs used time.Now().UTC(). The glebarez/SQLite driver stores time.Time as offset-bearing strings that SQLite compares lexically, so on hosts whose timezone is not UTC the GC silently skipped expired rows and deleted still-valid ones (visible as TestExpired*GarbageCollection failures on non-UTC developer machines; CI runs in UTC and passed). Use the same local basis for the GC cutoffs so the writes and the comparisons agree. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Implements transfer-server authentication via a server-level "local" issuer that mints pelican.transfer tokens, decoupled from per-namespace data issuers so it works even for a server with only public exports (and, ultimately, a transfer-only server). This is the embedded-issuer + group-authorization approach: identity stays a server concern, while "may use the transfer API" is an authorization gate on the local issuer. oauth2/issuer: - Mint the iss claim and OIDC discovery from each provider's own IssuerURL rather than always IssuerURLForNamespace(Namespace); behavior-neutral for existing per-namespace data issuers. - Register a server-level local issuer provider (iss=config.GetLocalIssuerUrl) under the reserved /.transfer route, so the transfer middleware's existing LocalIssuer check accepts its tokens with no new trust code. - Gate the pelican.transfer scope on Transfer.EnabledGroups at issuance (any authenticated user when unset) instead of granting it unconditionally. transfer / cmd: - Advertise the local issuer's discovery URL from /ping; the CLI discovers it instead of the first data-namespace issuer. - Move the shared OAuth2 callback under /api/v1.0/callback so a co-located director's ShortcutMiddleware does not treat it as an object request (404). - Request storage scopes in the auth-code credential bootstrap so the data movement is actually authorized (was only offline_access). - Drop the transfer_jobs.agent_job_id foreign key: the server runs its TransferManager with an in-memory store, so no jobs rows are persisted. Supporting: SSRF transport for issuer-metadata fetches, the transfer component in parameter validation, dead-code/lint cleanup, and TPC e2e test updates (namespace-aware device approval, start-URL handling, --wait, scoped tokens). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the placeholder config.EncryptString/DecryptString secret encryption with AES-GCM under a key derived from the server master key via database.LoadOrCreateMasterKey + database.DeriveSubKey (HKDF), matching how the embedded OIDC issuer derives its sub-keys. This gives credential and OAuth client secrets a single database-managed root of trust instead of the separate config-file scheme. The key is derived lazily on first use and cached, so no launcher/init change is required. There is no data migration: secrets written under the old scheme cannot be decrypted under the new one, which is acceptable while the transfer server is still in development. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Make sure the transfer server can run standalone - Fix initialization of code in tests (make sure master key tables are present; make sure we have nil guards).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This gives the client agent concept a pretty significant upgrade: it allows a standalone HTTP server to do the same thing.
Perhaps unsurprisingly, if you think about it, the challenge here is all the credential management. One needs to delegate access tokens to the transfer server (there's multiple ways to do this) and these need to be kept alive by the server in case there's a significant queue and the transfer isn't started for several hours.
This did necessitate two cleanups:
Marking as a draft now because I'd really like this for v7.28, after the user/group redesign. However, felt it'd be useful to share (and useful to run CI tests to whip it into shape...)!