|
| 1 | +DANS BagPack Profile v1.1.0 |
| 2 | +=========================== |
| 3 | + |
| 4 | +Introduction |
| 5 | +------------ |
| 6 | + |
| 7 | +### Version |
| 8 | + |
| 9 | +* Document version: 1.1.0 |
| 10 | +* Publication date: n/a |
| 11 | + |
| 12 | +### Status |
| 13 | + |
| 14 | +The status of this document is DRAFT. |
| 15 | + |
| 16 | +### Changes |
| 17 | + |
| 18 | +#### Changed from version 1.0.0 to 1.1.0 |
| 19 | + |
| 20 | +Change requirement 1.1. to also allow "holey bags" for support of external large objects. This change is backwards compatible because bags that were valid |
| 21 | +under the previous versions of this specification remain so. |
| 22 | + |
| 23 | +### Scope |
| 24 | + |
| 25 | +This document specifies what constitutes an acceptable DANS BagPack. This includes all the requirements for a bag to be successfully processed by the DANS Data |
| 26 | +Vault ingest workflow. |
| 27 | + |
| 28 | +### Overview and Conventions |
| 29 | + |
| 30 | +#### Keywords |
| 31 | + |
| 32 | +The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be |
| 33 | +interpreted as described in [RFC 2119]{:target=_blank}. |
| 34 | + |
| 35 | +The key word "SHOULD" is also used to specify requirements that are impossible or impractical to check by the archival organization (i.e., DANS). The client |
| 36 | +should do its best to meet these requirements but not rely on their being validated by the archival organization. |
| 37 | + |
| 38 | +#### Subdivisions |
| 39 | + |
| 40 | +The requirements are subdivided into the following sections: |
| 41 | + |
| 42 | +* RDA BagPack Related - requirements that refer back to the [RDA BagPack]{:target=_blank} specifications. If a bag only needs to comply with the RDA BagPack |
| 43 | + specifications, then it should be sufficient to only check this section. |
| 44 | +* Extra Requirements for DANS BagPack - requirements that are specific to the DANS BagPack Profile, and which are in addition to the RDA BagPack requirements. |
| 45 | + |
| 46 | +The sections are numbered and may have numbered subsections. The requirements themselves are stated as numbered rules. Rules may have parts that are labeled |
| 47 | +with letters: (a), (b), (c), etc. To uniquely identify a specific rule, use the notation |
| 48 | + |
| 49 | +``` |
| 50 | +<section-nr>[.<subsection-nr>].<rule-nr> [(<letter>)] |
| 51 | +``` |
| 52 | + |
| 53 | +Example: `2.3.4 (e)` means part **e** of the fourth rule in subsection 3 of section 2. |
| 54 | + |
| 55 | +#### XML namespaces |
| 56 | + |
| 57 | +When referring to XML element or attribute names or attribute values that have a prefix (such as `schema:name`) an element in a certain namespace is intended. |
| 58 | +The table below lists the mapping from prefix to namespace. In the actual document, the namespace may be bound to a different prefix, or be the default |
| 59 | +namespace. |
| 60 | + |
| 61 | +| Prefix | Namespace URI | Namespace documentation | |
| 62 | +|-----------|---------------------------------------------------------------------|--------------------------------------------------| |
| 63 | +| `schema` | `http://schema.org/` | [schema.org]{:target=_blank} | |
| 64 | +| `dvcore` | `https://dataverse.org/schema/core#` | Dataverse metadata elements | |
| 65 | +| `vaultMd` | `https://schemas.dans.knaw.nl/metadatablock/dansDataVaultMetadata#` | [DANS Data Vault Metadata block]{:target=_blank} | |
| 66 | + |
| 67 | +Requirements |
| 68 | +------------ |
| 69 | + |
| 70 | +### 1. RDA BagPack Related |
| 71 | + |
| 72 | +The following items are required by the [RDA BagPack]{:target=_blank} specifications: |
| 73 | + |
| 74 | +1. One of the following MUST hold: |
| 75 | + * the DANS BagPack is a valid bag, according to [BagIt v1.0]{:target=_blank} or [BagIt v0.97]{:target=_blank} |
| 76 | + * the DANS BagPack is a holey bag (i.e., a bag with a [fetch.txt]{:target=_blank} file listing the missing files and their fetch URLs). The files to be |
| 77 | + fetched MUST be downloadable from the given URL or obtainable from a well-known location and have the checksums listed in the payload manifests. |
| 78 | + "Obtainable from a well-known location" means that the repository containing the bag documents how to map the fetch-URL or a checksum for the file to |
| 79 | + the location where the file data is stored. |
| 80 | +2. (a) A DANS BagPack MUST contain a file `metadata/datacite.xml` (b) this file MUST be valid according to the |
| 81 | + [DataCite schema version 4.0 or later]{:target=_blank}, except for the requirement that there MUST be a DOI present: a DOI is not required for a DANS |
| 82 | + BagPack; (c) [DataCite's recommended properties]{:target=_blank} SHOULD be present. |
| 83 | +3. Other files besides `datacite.xml` MAY be present in the `metadata` folder. |
| 84 | + |
| 85 | +### 2. Extra Requirements for DANS BagPack |
| 86 | + |
| 87 | +The following items are required by the DANS BagPack Profile, in addition to the requirements of RDA BagPack: |
| 88 | + |
| 89 | +1. The `bag-info.txt` file SHOULD contain an element `BagIt-Profile-Identifier` set to the identifier of the [DANS BagPack BagIt Profile]{:target=_blank}: |
| 90 | + `https://doi.org/10.17026/e948-0r32`. |
| 91 | +2. (a) The bag MUST conform to the [DANS BagPack BagIt Profile]{:target=_blank} (even if the `BagIt-Profile-Identifier` element pointing to it is missing). (b) |
| 92 | + The bag SHOULD conform to any other BagIt profiles declared in the `BagIt-Profile-Identifier` element. |
| 93 | +3. There MUST be a file called `metadata/pid-mapping.txt`: the structure of this file MUST be rows formatted as `<identifier> <referenced object>`, where |
| 94 | + `<identifier>` is a unique URI and `<referenced object>` is the path to the file relative to the root of the bag, and both are separated by one or more |
| 95 | + spaces. One of the lines MAY be mapping from the dataset DOI to a folder directly under the `data` folder. |
| 96 | +4. (a) There MUST a `metadata/oai-ore.json` file which MUST be a valid JSON-LD 1.0 or higher document; (b) The object described in the |
| 97 | + document MUST have the attribute `vaultMd:dansBagId` whose value is a URN:UUID. (c) The `ore:AggregatedResource`s of the `ore:Aggregation` MUST have the |
| 98 | + following attributes: (i) `@id` whose value is a URI; (ii) `schema:name`; (iii) `dvcore:restricted`, with value true or false. |
| 99 | +5. There MUST be a one-to-one mapping between the files in the `data` folder and the files described in the Aggregation contained in `oai-ore.jsonld` file: |
| 100 | + (a) all identifiers found in 2.4(c)(i) MUST be present in the left column of `pid-mapping.txt`; (b) the set of paths pointing to files found in the right |
| 101 | + column of `pid-mapping.txt` MUST be equal to the set of paths of files present in the `data` folder (relative to the bag root). |
| 102 | + |
| 103 | +[RFC 2119]: {{ rfc_2119 }} |
| 104 | +[BagIt v1.0]: {{ bagit }} |
| 105 | +[fetch.txt]: {{ fetch_txt }} |
| 106 | +[BagIt v0.97]: {{ bagit_0_97 }} |
| 107 | +[RDA BagPack]: {{ rda_bagpack }} |
| 108 | +[DataCite schema version 4.0 or later]: {{ datacite_4_0 }} |
| 109 | +[DANS BagPack BagIt Profile]: {{ dans_bagpack_bagit_profile }} |
| 110 | +[DataCite's recommended properties]: {{ levels_of_obligation }} |
| 111 | +[schema.org]: {{ schema_org }} |
| 112 | +[DANS Data Vault Metadata block]: {{ dans_data_vault_metadata_block }} |
0 commit comments