You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/pages/federating-your-data/origin.mdx
+31-24Lines changed: 31 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ import { Callout } from 'nextra/components'
4
4
5
5
# Federating Your Data via a Pelican Origin
6
6
7
-
Pelican users who want to share data within a Pelican federation do so via an [*Origin*](../about-pelican/core-concepts.mdx#origins).
7
+
Pelican users who want to share data within a Pelican federation do so via an [***Origin***](../about-pelican/core-concepts.mdx#origins).
8
8
Origins are a crucial component of Pelican's architecture for several reasons: they act as an adapter between various storage backends and Pelican federations, they provide fine-grained access controls for that data, and they act as a circuit breaker that protects the underlying data repository from large volumes of data movement.
9
9
That is, they figure out how to take data from wherever it lives (such as a POSIX filesystem, S3 buckets, HTTPS servers, etc.) and transform it into a format that the federation can utilize, all while respecting your data access requirements and protecting the storage they make accessible.
10
10
@@ -31,11 +31,13 @@ Pelican Origins have two major components -- one is a data transfer endpoint pow
31
31
By design, these two components are hosted behind two separate ports, each dedicated a distinct function.
32
32
Pelican has chosen ports 8443 for data transfers and 8444 for the browser interface as defaults, but you may change these port numbers through your Origin's [configuration file](../parameters.mdx) with parameters [`Server.WebPort`](../parameters.mdx#Server-WebPort) and [`Origin.Port`](../parameters.mdx#Origin-Port), respectively.
33
33
34
-
In order for Pelican Origins to work properly, **both** of these ports need to be accessible by the federation, which in most cases means they need to be open to the internet. If your server host has a firewall policy in place, please open these two ports for both incoming the outgoing TCP requests.
34
+
In order for Pelican Origins to work properly, **both** of these ports need to be accessible by the federation, which in most cases means they need to be open to the internet.
35
+
If your server host has a firewall policy in place, please open these two ports for both incoming and outgoing TCP requests.
35
36
36
37
### Preparing TLS Credentials
37
38
38
-
Data transfers in Pelican rely on HTTPS, the web encryption scheme used by everyone from banks to instagram that's responsible for securely transmitting data between internet-connected computers. To configure the Origin with HTTPS, you'll first need to acquire three things:
39
+
Data transfers in Pelican rely on HTTPS, the web encryption scheme used by everyone from banks to instagram that's responsible for securely transmitting data between internet-connected computers.
40
+
To configure the Origin with HTTPS, you'll first need to acquire three things:
39
41
40
42
- A valid Transport Layer Security (TLS) certificate
41
43
- The private key associated with the certificate
@@ -59,11 +61,17 @@ Once you go through the process, locate your credential files and set the follow
59
61
60
62
Since your TLS certificate is associated with your domain name, you will need to change the default hostname of Pelican server to be consistent. Set `Server.Hostname` to your domain name (e.g. `example.com`).
61
63
62
-
### Picking a Federation to join
64
+
### Picking a Federation and your Namespace Prefix(es)
63
65
64
-
Before serving an Origin, you need to decide which [**federation**](../about-pelican/core-concepts.mdx#federations) your data will be accessed through. For example, the Open Science Data Federation (OSDF) is Pelican's flagship federation, and if you are interested in serving an OSDF Origin, you can refer to the [OSDF website](https://osg-htc.org/services/osdf.html) for details about how to join. If you're unsure about which federation to join and aren't ready to run your own federation, this is a good place to start.
66
+
Before serving an Origin, you need to decide which [***federation***](../about-pelican/core-concepts.mdx#federations) your data will be accessed through. For example, the Open Science Data Federation (OSDF) is Pelican's flagship federation, and if you are interested in serving an OSDF Origin, you can refer to the [OSDF website](https://osg-htc.org/services/osdf.html) for details about how to join. If you're unsure about which federation to join and aren't ready to run your own federation, this is a good place to start.
65
67
66
-
All federations are uniquely identified their URL. For example, the OSDF's URL is `https://osg-htc.org` and Pelican command line client commands that interact with objects from this federation would indicate this by using Pelican URLs like `pelican://osg-htc.org/some/namespace/path`.
68
+
All federations are uniquely identified by their URL. For example, the OSDF's URL is `https://osg-htc.org` and Pelican command line client commands that interact with objects from this federation would indicate this by using Pelican URLs like `pelican://osg-htc.org/some/namespace/path`.
69
+
70
+
Once you've picked a federation, you should think about the namespace prefix(es) you'll want to tie your data to.
71
+
Namespace prefixes map data from Origins into something resembling a "file path" within their federation.
72
+
For example, an S3 bucket with data about whale sightings may be mapped to the namespace prefix `/whales`, such that an object named `2025-sightings.csv` in the bucket would be referred to as `/whales/2025-sightings.csv`.
73
+
Its fully-qualified name, scoped to the federation, would then be `pelican://<federation URL>/whales/2025-sitings.csv`.
74
+
While it's convenient to think of these prefixes as file paths, it should be noted the comparison is only logical -- there isn't necessarily a `/whales` directory anywhere.
67
75
68
76
For more information about how to choose prefixes, see [Choosing a Namespace Prefix](./choosing-namespaces.mdx)
69
77
@@ -114,7 +122,7 @@ Available capabilities include:
114
122
- `PublicReads`: When set, objects from the namespace become public and require no authorization to read.
115
123
- `Writes`: When included, objects can be written back to the storage backend by Pelican. Write operations _always_ require a valid authorization token.
116
124
- `DirectReads`: When included, a namespace indicates that it is willing to serve clients directly and does not require data to be pulled through a cache. Disabling this feature may be useful in cases where the Origin isn't very performant or has to pay egress costs when data moves through it. Note that this is respected by federation central services, but may not be respected by all clients.
117
-
- `Listings`: When included, the namespace indicates it will allow object discovery. Be careful when setting this for authorized namespaces, as this will allow anyone to discover the names of objects exported by this namespace. Listings is required if your Origin must support any recursive operations, such as downloading entire directories or object prefixes.
125
+
- `Listings`: When included, the namespace indicates it permits object discovery. Authorization requirements for listing objects through an Origin are tied to the values of `Reads` and `PublicReads`. If your namespace sets `Reads`, object discovery will require a valid token, while prefixes with `PublicReads` will not require tokens. This capability is ***required*** if your Origin must support any recursive operations, such as downloading entire directories or object prefixes.
118
126
119
127
<Callout type="warning">
120
128
Most Origins should have either `Reads` or `PublicReads` enabled. If neither is set, the Origin won't export any data.
@@ -191,7 +199,7 @@ Multiple namespaces can be exported by the same Origin but they must all have th
191
199
That is, if the Origin serves files from POSIX, it can _only_ serve files from POSIX and not objects from S3.
192
200
However, separate Origins can serve files from POSIX and objects from S3 under the same namespace prefix, allowing the Origin administrators to aggregate data under a unified namespace.
193
201
194
-
One additional constraint to be aware of is that failure to advertise any of the prefixes in a multi-export Origin will prevent the entire Origin from functioning.
202
+
One current limitation to be aware of is that failure to advertise any of the prefixes in a multi-export Origin will prevent the entire Origin from functioning.
195
203
For example, if your federation requires an administrator to pre-approve namespaces (as does the OSDF) but only a subset of the namespaces from the Origin are approved at the Registry, this will prevent the entire Origin from joining the federation.
196
204
See [Federation Namespace Prefix Registration](#federation-namespace-prefix-registration) for more details.
# Specify a human readable name for the Origin, which shows up in the Director's UI.
315
323
# Without this specification, the Origin would show up in the Director under its hostname
@@ -319,9 +327,11 @@ Xrootd:
319
327
320
328
### Federation Namespace Prefix Registration
321
329
322
-
Registering a federation namespace prefix is the process of claiming the prefix with the federation's [*Registry*](../about-pelican/core-concepts.mdx#registry).
330
+
Registering a federation namespace prefix is the process of claiming the prefix with the federation's [***Registry***](../about-pelican/core-concepts.mdx#registry).
323
331
This asserts your ownership over the namespace and gives you the ability to further subdivide the prefix by tying it to a public/private key pair you posses.
324
332
333
+
For more information about how to choose these prefixes, see [Choosing a Namespace Prefix](./choosing-namespaces.mdx)
334
+
325
335
Generally this process is a pre-requisite to setting up an functional Origin, but it's not included in this page's "Before Starting" section because Origins attempt to do this automatically on server startup.
326
336
However, there are some cases where you may not wish to rely on this automatic feature.
327
337
These may include:
@@ -354,10 +364,6 @@ Finally, submit the registration, and if your federation requires namespace appr
354
364
In the meantime, store your private key someplace safe -- once you're ready to start your Origin, you'll configure it to use the private key using the [`IssuerKeysDirectory`](../parameters.mdx#IssuerKeysDirectory) configuration option.
355
365
Once your registration is complete/approved and your keys are hooked up to the Origin, your Origin should have control over your new prefix.
356
366
357
-
358
-
359
-
360
-
361
367
## Serving & Administering Your Origin
362
368
363
369
Once you've drafted your Origin's configuration and handled the pre-requisites from the [Before Starting](#before-starting) section, you're ready to start serving data.
@@ -384,7 +390,7 @@ Pelican admin interface is not initialized
384
390
To initialize, login at https://localhost:8444/view/initialization/code/ with the following code:
385
391
551220
386
392
```
387
-
See the [admin website configuration](#login-to-admin-website) documentation section for more information about initializing your Origin's admin website.
393
+
See [Logging in to the Origin's Admin Page](#logging-in-to-the-origins-admin-page) for more information about initializing your Origin's admin website.
388
394
389
395
### Additional Command Line Arguments for Origins
390
396
@@ -394,15 +400,14 @@ This section documents additional arguments you can pass via the command line wh
394
400
* **-m or --mode**: Set the mode for the Origin service ('posix'|'s3, default to 'posix').
395
401
* **-p or --port**: Set the port at which the Pelican admin website should be accessible.
396
402
* **--writeable**: A boolean value to allow or disable writing to the Origin (default is true).
397
-
* **-v
398
-
403
+
* **-v**: A shortcut for configuring docker-style volume mounts/namespace prefixes for the Origin (POSIX only). For example, `-v /local/path:/federation/prefix` will bind a directory `/local/path` to the namespace prefix `/federation/prefix`. Use of configuration yaml is strongly preferred over this method because config passed with this flag cannot be picked up by tools like `pelican config summary`.
399
404
* **--config**: Set the location of the configuration file.
400
405
* **-d or --debug**: Enable debugging mode, which greatly increases the Pelican's logging verbosity
401
406
* **-l or --log**: Set the location of a file that will capture Pelican logs. Setting this will prevent logging output from printing to your terminal.
402
407
403
408
For more information about available yaml configuration options, refer to the [Parameters page](../parameters.mdx).
404
409
405
-
### Login to Admin Website
410
+
### Logging in to the Origin's Admin Page
406
411
407
412
After your Origin is running, the next step is to initialize its admin website, which can be used by administrators for monitoring and further configuration.
408
413
To initialize this interface, go to the URL specified in the terminal.
@@ -413,12 +418,12 @@ Copy the passcode from the terminal where you launch Pelican Origin and paste to
0 commit comments