Skip to content

Add remote transfer server functionality#3509

Draft
bbockelm wants to merge 22 commits into
PelicanPlatform:mainfrom
bbockelm:xfer-agent
Draft

Add remote transfer server functionality#3509
bbockelm wants to merge 22 commits into
PelicanPlatform:mainfrom
bbockelm:xfer-agent

Conversation

@bbockelm

@bbockelm bbockelm commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator

This gives the client agent concept a pretty significant upgrade: it allows a standalone HTTP server to do the same thing.

Perhaps unsurprisingly, if you think about it, the challenge here is all the credential management. One needs to delegate access tokens to the transfer server (there's multiple ways to do this) and these need to be kept alive by the server in case there's a significant queue and the transfer isn't started for several hours.

This did necessitate two cleanups:

  1. Client - allow it to take a "token provider" interface that will provide a token on demand when needed for the transfer (for example, reading it from the database).
  2. Credential config - make it possible to store non-OSDF credentials so cross-federation transfer can occur unambiguously.

Marking as a draft now because I'd really like this for v7.28, after the user/group redesign. However, felt it'd be useful to share (and useful to run CI tests to whip it into shape...)!

bbockelm and others added 20 commits June 6, 2026 09:32
Record which source (Authorization header, login cookie, or authz query
parameter) a verified token arrived from on the VerifyResult, so downstream
handlers can apply source-specific policy. Add scope-parsing helpers
(including hierarchical resource-scope matching), the pelican.transfer token
scope (also accepted as a standard scope by the embedded issuer), and the
TransferType server role. These leaf-level additions are depended upon by
the config and transfer changes that follow.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Refactor the client credential/token configuration so it is no longer
OSDF-specific: rename OSDFConfig to CredentialConfig (keeping OSDFConfig as
a deprecated alias) and key OAuth client and transfer-server entries by
federation discovery URL via GetFederationCredentials / FindOauthClient /
FindTransferServer. PrefixEntry now embeds a reusable ClientRegistration
struct, and token generation accepts an explicit DiscoveryURL.

Make token acquisition pluggable: add a TokenProvider interface plus a
StaticTokenProvider, allow a tokenGenerator to delegate to an external
provider (SetExternalProvider), and expose WithTokenProvider /
WithSourceTokenProvider transfer options. The PrefixEntry/ClientRegistration
restructuring and its construction sites are mutually coupled, so the config
and token-acquisition changes land together.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add the transfer server: an authenticated HTTP API over the client_agent
transfer engine, with per-user job ownership, an encrypted credential store,
a per-issuer OAuth2 client registry, and OAuth2 credential bootstrap flows
(RFC 8693 token exchange and authorization code). Includes the database
migration, configuration parameters (Transfer.* and Origin.EnableTransferAPI),
launcher wiring for both standalone and origin-embedded deployment, the
generated Swagger paths, and unit tests. The end-to-end TPC test is added
separately in the following commit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add an end-to-end test that stands up a federation and exercises a
third-party copy driven through the transfer server, covering the
origin-pull and direct-credential paths.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add the `pelican transfer` command group for managing a transfer server's
resources: credential and OAuth2-client management, and credential
bootstrapping. The CLI authenticates to the transfer server via dynamic
client registration plus the device-code flow, and drives the server's
token-exchange and authorization-code bootstrap endpoints. Includes the
generated command-reference documentation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a `pelican object copy` subcommand that drives client.DoCopy to perform
a server-to-server (third-party) copy between two federation URLs. It reuses
the transfer CLI's credential lookup/bootstrap helpers, so it follows the
transfer CLI in the series.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add the transfer server pages to the web frontend (credentials, jobs, and
landing pages plus navigation) and register "transfer" as a server UI page
in the Go web server.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mdformat (CommonMark, no table plugin) collapses GFM tables to a single
line on commit. Convert the design doc's tables to bullet lists so they
render and survive the formatter.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a WalletSession to the client agent that governs access to the user's
encrypted credential file. The agent never prompts for a password; the
password is supplied over the (user-owned unix socket) wallet API and cached
via the config keyring, so the agent can decrypt and rewrite the wallet
without a TTY.

Adds config.ForgetPassword() to clear the cached password (locking the
wallet without restarting), and three endpoints under
/api/v1.0/transfer-agent/wallet: status, open, close. The wallet is locked
on server shutdown.

This is the foundation for agent-side credential use and background token
refresh; the CLI is responsible for warming the wallet (acquiring the right
tokens) before submitting a job.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a NonInteractive mode to token acquisition: TokenGenerationOpts.NonInteractive
and a WithNonInteractive transfer option. When set, AcquireToken still uses
cached, refreshable, or locally-generatable tokens but fails with an
actionable error instead of falling back to the interactive OAuth2 device-code
flow.

Wire it into the client agent: transfers always run non-interactively (the
agent has no TTY), and when the wallet is open the agent enables OAuth token
acquisition so the client library selects/refreshes the right per-destination
token from the user's wallet. When the wallet is locked, acquisition is off so
the daemon never blocks on a decrypt/device-code prompt (explicit tokens and
environment discovery still work).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add proactive, non-interactive refresh of stored OAuth2 tokens nearing
expiry so queued jobs always have a usable token by the time they run.

- Extract a reusable refreshTokenEntry helper from AcquireToken (refresh-token
  grant for a single stored token; no behavior change to AcquireToken).
- Add client.RefreshExpiringCredentials: scans the wallet for refreshable
  tokens within a window, resolves each credential's issuer (directly when the
  prefix is an issuer URL, otherwise via a Director lookup for the namespace),
  refreshes, and writes the wallet back once.
- Run it from the client agent every 10 minutes (window 30 minutes) while the
  wallet is open, serialized so cycles do not overlap.

Note: the per-transfer client path still lazily refreshes *expired* tokens.
The 30-minute window keeps the two paths from typically acting on the same
credential at once; on a wallet-write race the worst case is a lost
refresh-token rotation (re-acquire needed). File-level locking of the wallet
is left as a follow-up if needed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add WalletStatus, OpenWallet, and CloseWallet to the agent API client so a
submitting client can unlock the agent's credential wallet (and check/lock
it) before submitting jobs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add WithCredentialFileLock (an in-process mutex plus a best-effort advisory
file lock via gofrs/flock) and UpsertPrefixEntry, which re-reads the wallet
under the lock and replaces a single prefix's OAuth client entry before
writing. This prevents lost updates when several read-modify-write cycles run
concurrently (notably the agent's background refresh vs. per-transfer lazy
refresh), while preserving concurrent changes to other prefixes.

Route AcquireToken's credential saves and the background refresh through the
locked per-prefix upsert. The lock is held only around the brief write (not
network calls); writes remain atomic (temp-file-and-rename), so if the
advisory lock is unavailable the code proceeds optimistically.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
When an object get/put/copy/prestage is run with --async, the (interactive)
CLI now warms the user's wallet with the tokens the transfer needs and opens
the client agent's wallet before submitting, so the agent — which has no
terminal — can authorize and refresh the transfer non-interactively. Sources
are warmed with a read credential and destinations with a write credential;
public namespaces and jobs given an explicit --token are skipped.

Also add `pelican client-agent warm <url>...` to acquire credentials and
unlock the agent's wallet on demand (with --write for upload scopes).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The embedded issuer writes the expiry columns of its token, device-code, and
JWT-assertion tables with the local clock (time.Now().Add(...)), but the
garbage-collection cutoffs used time.Now().UTC(). The glebarez/SQLite driver
stores time.Time as offset-bearing strings that SQLite compares lexically, so
on hosts whose timezone is not UTC the GC silently skipped expired rows and
deleted still-valid ones (visible as TestExpired*GarbageCollection failures on
non-UTC developer machines; CI runs in UTC and passed). Use the same local
basis for the GC cutoffs so the writes and the comparisons agree.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Implements transfer-server authentication via a server-level "local" issuer
that mints pelican.transfer tokens, decoupled from per-namespace data issuers
so it works even for a server with only public exports (and, ultimately, a
transfer-only server). This is the embedded-issuer + group-authorization
approach: identity stays a server concern, while "may use the transfer API"
is an authorization gate on the local issuer.

oauth2/issuer:
- Mint the iss claim and OIDC discovery from each provider's own IssuerURL
  rather than always IssuerURLForNamespace(Namespace); behavior-neutral for
  existing per-namespace data issuers.
- Register a server-level local issuer provider (iss=config.GetLocalIssuerUrl)
  under the reserved /.transfer route, so the transfer middleware's existing
  LocalIssuer check accepts its tokens with no new trust code.
- Gate the pelican.transfer scope on Transfer.EnabledGroups at issuance
  (any authenticated user when unset) instead of granting it unconditionally.

transfer / cmd:
- Advertise the local issuer's discovery URL from /ping; the CLI discovers it
  instead of the first data-namespace issuer.
- Move the shared OAuth2 callback under /api/v1.0/callback so a co-located
  director's ShortcutMiddleware does not treat it as an object request (404).
- Request storage scopes in the auth-code credential bootstrap so the data
  movement is actually authorized (was only offline_access).
- Drop the transfer_jobs.agent_job_id foreign key: the server runs its
  TransferManager with an in-memory store, so no jobs rows are persisted.

Supporting: SSRF transport for issuer-metadata fetches, the transfer component
in parameter validation, dead-code/lint cleanup, and TPC e2e test updates
(namespace-aware device approval, start-URL handling, --wait, scoped tokens).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the placeholder config.EncryptString/DecryptString secret encryption
with AES-GCM under a key derived from the server master key via
database.LoadOrCreateMasterKey + database.DeriveSubKey (HKDF), matching how the
embedded OIDC issuer derives its sub-keys. This gives credential and OAuth
client secrets a single database-managed root of trust instead of the separate
config-file scheme.

The key is derived lazily on first use and cached, so no launcher/init change
is required. There is no data migration: secrets written under the old scheme
cannot be decrypted under the new one, which is acceptable while the transfer
server is still in development.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Make sure the transfer server can run standalone
- Fix initialization of code in tests (make sure master key tables are
  present; make sure we have nil guards).
Comment thread config/issuer_metadata.go Dismissed
@bbockelm bbockelm added enhancement New feature or request origin Issue relating to the origin component labels Jun 15, 2026
@bbockelm bbockelm linked an issue Jun 15, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request origin Issue relating to the origin component

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upgrade client agent to a transfer server

2 participants