Skip to content

Commit 99a9316

Browse files
committed
Incorporate Andrew's review feedback
1 parent 1015794 commit 99a9316

1 file changed

Lines changed: 31 additions & 24 deletions

File tree

docs/pages/federating-your-data/origin.mdx

Lines changed: 31 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ import { Callout } from 'nextra/components'
44

55
# Federating Your Data via a Pelican Origin
66

7-
Pelican users who want to share data within a Pelican federation do so via an [*Origin*](../about-pelican/core-concepts.mdx#origins).
7+
Pelican users who want to share data within a Pelican federation do so via an [***Origin***](../about-pelican/core-concepts.mdx#origins).
88
Origins are a crucial component of Pelican's architecture for several reasons: they act as an adapter between various storage backends and Pelican federations, they provide fine-grained access controls for that data, and they act as a circuit breaker that protects the underlying data repository from large volumes of data movement.
99
That is, they figure out how to take data from wherever it lives (such as a POSIX filesystem, S3 buckets, HTTPS servers, etc.) and transform it into a format that the federation can utilize, all while respecting your data access requirements and protecting the storage they make accessible.
1010

@@ -31,11 +31,13 @@ Pelican Origins have two major components -- one is a data transfer endpoint pow
3131
By design, these two components are hosted behind two separate ports, each dedicated a distinct function.
3232
Pelican has chosen ports 8443 for data transfers and 8444 for the browser interface as defaults, but you may change these port numbers through your Origin's [configuration file](../parameters.mdx) with parameters [`Server.WebPort`](../parameters.mdx#Server-WebPort) and [`Origin.Port`](../parameters.mdx#Origin-Port), respectively.
3333

34-
In order for Pelican Origins to work properly, **both** of these ports need to be accessible by the federation, which in most cases means they need to be open to the internet. If your server host has a firewall policy in place, please open these two ports for both incoming the outgoing TCP requests.
34+
In order for Pelican Origins to work properly, **both** of these ports need to be accessible by the federation, which in most cases means they need to be open to the internet.
35+
If your server host has a firewall policy in place, please open these two ports for both incoming and outgoing TCP requests.
3536

3637
### Preparing TLS Credentials
3738

38-
Data transfers in Pelican rely on HTTPS, the web encryption scheme used by everyone from banks to instagram that's responsible for securely transmitting data between internet-connected computers. To configure the Origin with HTTPS, you'll first need to acquire three things:
39+
Data transfers in Pelican rely on HTTPS, the web encryption scheme used by everyone from banks to instagram that's responsible for securely transmitting data between internet-connected computers.
40+
To configure the Origin with HTTPS, you'll first need to acquire three things:
3941

4042
- A valid Transport Layer Security (TLS) certificate
4143
- The private key associated with the certificate
@@ -59,11 +61,17 @@ Once you go through the process, locate your credential files and set the follow
5961

6062
Since your TLS certificate is associated with your domain name, you will need to change the default hostname of Pelican server to be consistent. Set `Server.Hostname` to your domain name (e.g. `example.com`).
6163

62-
### Picking a Federation to join
64+
### Picking a Federation and your Namespace Prefix(es)
6365

64-
Before serving an Origin, you need to decide which [**federation**](../about-pelican/core-concepts.mdx#federations) your data will be accessed through. For example, the Open Science Data Federation (OSDF) is Pelican's flagship federation, and if you are interested in serving an OSDF Origin, you can refer to the [OSDF website](https://osg-htc.org/services/osdf.html) for details about how to join. If you're unsure about which federation to join and aren't ready to run your own federation, this is a good place to start.
66+
Before serving an Origin, you need to decide which [***federation***](../about-pelican/core-concepts.mdx#federations) your data will be accessed through. For example, the Open Science Data Federation (OSDF) is Pelican's flagship federation, and if you are interested in serving an OSDF Origin, you can refer to the [OSDF website](https://osg-htc.org/services/osdf.html) for details about how to join. If you're unsure about which federation to join and aren't ready to run your own federation, this is a good place to start.
6567

66-
All federations are uniquely identified their URL. For example, the OSDF's URL is `https://osg-htc.org` and Pelican command line client commands that interact with objects from this federation would indicate this by using Pelican URLs like `pelican://osg-htc.org/some/namespace/path`.
68+
All federations are uniquely identified by their URL. For example, the OSDF's URL is `https://osg-htc.org` and Pelican command line client commands that interact with objects from this federation would indicate this by using Pelican URLs like `pelican://osg-htc.org/some/namespace/path`.
69+
70+
Once you've picked a federation, you should think about the namespace prefix(es) you'll want to tie your data to.
71+
Namespace prefixes map data from Origins into something resembling a "file path" within their federation.
72+
For example, an S3 bucket with data about whale sightings may be mapped to the namespace prefix `/whales`, such that an object named `2025-sightings.csv` in the bucket would be referred to as `/whales/2025-sightings.csv`.
73+
Its fully-qualified name, scoped to the federation, would then be `pelican://<federation URL>/whales/2025-sitings.csv`.
74+
While it's convenient to think of these prefixes as file paths, it should be noted the comparison is only logical -- there isn't necessarily a `/whales` directory anywhere.
6775

6876
For more information about how to choose prefixes, see [Choosing a Namespace Prefix](./choosing-namespaces.mdx)
6977

@@ -114,7 +122,7 @@ Available capabilities include:
114122
- `PublicReads`: When set, objects from the namespace become public and require no authorization to read.
115123
- `Writes`: When included, objects can be written back to the storage backend by Pelican. Write operations _always_ require a valid authorization token.
116124
- `DirectReads`: When included, a namespace indicates that it is willing to serve clients directly and does not require data to be pulled through a cache. Disabling this feature may be useful in cases where the Origin isn't very performant or has to pay egress costs when data moves through it. Note that this is respected by federation central services, but may not be respected by all clients.
117-
- `Listings`: When included, the namespace indicates it will allow object discovery. Be careful when setting this for authorized namespaces, as this will allow anyone to discover the names of objects exported by this namespace. Listings is required if your Origin must support any recursive operations, such as downloading entire directories or object prefixes.
125+
- `Listings`: When included, the namespace indicates it permits object discovery. Authorization requirements for listing objects through an Origin are tied to the values of `Reads` and `PublicReads`. If your namespace sets `Reads`, object discovery will require a valid token, while prefixes with `PublicReads` will not require tokens. This capability is ***required*** if your Origin must support any recursive operations, such as downloading entire directories or object prefixes.
118126

119127
<Callout type="warning">
120128
Most Origins should have either `Reads` or `PublicReads` enabled. If neither is set, the Origin won't export any data.
@@ -191,7 +199,7 @@ Multiple namespaces can be exported by the same Origin but they must all have th
191199
That is, if the Origin serves files from POSIX, it can _only_ serve files from POSIX and not objects from S3.
192200
However, separate Origins can serve files from POSIX and objects from S3 under the same namespace prefix, allowing the Origin administrators to aggregate data under a unified namespace.
193201

194-
One additional constraint to be aware of is that failure to advertise any of the prefixes in a multi-export Origin will prevent the entire Origin from functioning.
202+
One current limitation to be aware of is that failure to advertise any of the prefixes in a multi-export Origin will prevent the entire Origin from functioning.
195203
For example, if your federation requires an administrator to pre-approve namespaces (as does the OSDF) but only a subset of the namespaces from the Origin are approved at the Registry, this will prevent the entire Origin from joining the federation.
196204
See [Federation Namespace Prefix Registration](#federation-namespace-prefix-registration) for more details.
197205

@@ -304,12 +312,12 @@ Origin:
304312
- StoragePrefix: /first/path
305313
FederationPrefix: /prefix-1
306314
Capabilities: ["Reads", "Writes", "Listings", "DirectReads"]
307-
IssuerUrls: ["https://chtc.cs.wisc.edu.com"]
315+
IssuerUrls: ["https://chtc.cs.wisc.edu"]
308316
309317
- StoragePrefix: /my/data/private
310318
FederationPrefix: /my/prefix/private
311319
Capabilities: ["Reads", "DirectReads"]
312-
IssuerUrls: ["https://chtc.cs.wisc.edu.com"]
320+
IssuerUrls: ["https://chtc.cs.wisc.edu"]
313321
314322
# Specify a human readable name for the Origin, which shows up in the Director's UI.
315323
# Without this specification, the Origin would show up in the Director under its hostname
@@ -319,9 +327,11 @@ Xrootd:
319327

320328
### Federation Namespace Prefix Registration
321329

322-
Registering a federation namespace prefix is the process of claiming the prefix with the federation's [*Registry*](../about-pelican/core-concepts.mdx#registry).
330+
Registering a federation namespace prefix is the process of claiming the prefix with the federation's [***Registry***](../about-pelican/core-concepts.mdx#registry).
323331
This asserts your ownership over the namespace and gives you the ability to further subdivide the prefix by tying it to a public/private key pair you posses.
324332

333+
For more information about how to choose these prefixes, see [Choosing a Namespace Prefix](./choosing-namespaces.mdx)
334+
325335
Generally this process is a pre-requisite to setting up an functional Origin, but it's not included in this page's "Before Starting" section because Origins attempt to do this automatically on server startup.
326336
However, there are some cases where you may not wish to rely on this automatic feature.
327337
These may include:
@@ -354,10 +364,6 @@ Finally, submit the registration, and if your federation requires namespace appr
354364
In the meantime, store your private key someplace safe -- once you're ready to start your Origin, you'll configure it to use the private key using the [`IssuerKeysDirectory`](../parameters.mdx#IssuerKeysDirectory) configuration option.
355365
Once your registration is complete/approved and your keys are hooked up to the Origin, your Origin should have control over your new prefix.
356366

357-
358-
359-
360-
361367
## Serving & Administering Your Origin
362368

363369
Once you've drafted your Origin's configuration and handled the pre-requisites from the [Before Starting](#before-starting) section, you're ready to start serving data.
@@ -384,7 +390,7 @@ Pelican admin interface is not initialized
384390
To initialize, login at https://localhost:8444/view/initialization/code/ with the following code:
385391
551220
386392
```
387-
See the [admin website configuration](#login-to-admin-website) documentation section for more information about initializing your Origin's admin website.
393+
See [Logging in to the Origin's Admin Page](#logging-in-to-the-origins-admin-page) for more information about initializing your Origin's admin website.
388394

389395
### Additional Command Line Arguments for Origins
390396

@@ -394,15 +400,14 @@ This section documents additional arguments you can pass via the command line wh
394400
* **-m or --mode**: Set the mode for the Origin service ('posix'|'s3, default to 'posix').
395401
* **-p or --port**: Set the port at which the Pelican admin website should be accessible.
396402
* **--writeable**: A boolean value to allow or disable writing to the Origin (default is true).
397-
* **-v
398-
403+
* **-v**: A shortcut for configuring docker-style volume mounts/namespace prefixes for the Origin (POSIX only). For example, `-v /local/path:/federation/prefix` will bind a directory `/local/path` to the namespace prefix `/federation/prefix`. Use of configuration yaml is strongly preferred over this method because config passed with this flag cannot be picked up by tools like `pelican config summary`.
399404
* **--config**: Set the location of the configuration file.
400405
* **-d or --debug**: Enable debugging mode, which greatly increases the Pelican's logging verbosity
401406
* **-l or --log**: Set the location of a file that will capture Pelican logs. Setting this will prevent logging output from printing to your terminal.
402407

403408
For more information about available yaml configuration options, refer to the [Parameters page](../parameters.mdx).
404409

405-
### Login to Admin Website
410+
### Logging in to the Origin's Admin Page
406411

407412
After your Origin is running, the next step is to initialize its admin website, which can be used by administrators for monitoring and further configuration.
408413
To initialize this interface, go to the URL specified in the terminal.
@@ -413,12 +418,12 @@ Copy the passcode from the terminal where you launch Pelican Origin and paste to
413418

414419
<ExportedImage width={1000} height={1000} src={"/pelican/federating-your-data/origin-otp.png"} alt={"Screenshot of Pelican website activation page"} />
415420

416-
In our case, it's `551220` from the example terminal above.
417-
418-
> **NOTE:** that your one-time passcode will be different from the example.
421+
The example terminal from "Starting Your Origin" shows `551220`, but your one-time passcode will be different.
419422

420-
> **NOTE:** These one-time passcodes will be refreshed every few minutes.
423+
<Callout type="info">
424+
These one-time passcodes will be refreshed every few minutes.
421425
Find the latest passcode in the terminal before proceeding.
426+
</Callout>
422427

423428
### Set up password for the admin
424429

@@ -469,8 +474,10 @@ You may change the time range of the graph by changing the **Reporting Period**
469474

470475
<ExportedImage width={1000} height={1000} src={"/pelican/federating-your-data/origin-dashboard-graph.png"} alt={"Screenshot of the graph panel on Pelican Origin website dashboard page"} style={{marginTop: 30}} />
471476

472-
> **NOTE:** This graph may be empty when the Origin first starts, as it takes several minutes to collect enough data for the display.
477+
<Callout type="info">
478+
This graph may be empty when the Origin first starts, as it takes several minutes to collect enough data for the display.
473479
Try refreshing the page after the Origin has been running for ~5 minutes and you you should see data being aggregated.
480+
</Callout>
474481

475482
### Test Origin Functionality
476483

0 commit comments

Comments
 (0)