Suggest to read datapackage.json if it is one of the resources

Zenodo now offers Data Package as an export format for the metadata (e.g. https://zenodo.org/records/10054230/export/datapackage). It includes the deposit metadata (contributors, license, etc.) and all files as resources. These resources are generic (with name, path, format, mimetype, bytes, hash): they are not specified as tabular (even if they are) and do not contain a schema.

For deposits that have a `datapackage.json` file, one of the resources listed will be that `datapackage.json`:

``` r
library(frictionless)
(p <- read_package("https://zenodo.org/records/10054230/export/datapackage"))
#> A Data Package with 22 resources:
#> • HG_OOSTENDE-acceleration-2017.csv.gz
#> • HG_OOSTENDE-gps-2013.csv.gz
#> • HG_OOSTENDE-gps-2019.csv.gz
#> • HG_OOSTENDE-gps-2021.csv.gz
#> • HG_OOSTENDE-acceleration-2016.csv.gz
#> • HG_OOSTENDE-gps-2017.csv.gz
#> • HG_OOSTENDE-gps-2016.csv.gz
#> • HG_OOSTENDE-acceleration-2022.csv.gz
#> • HG_OOSTENDE-acceleration-2020.csv.gz
#> • HG_OOSTENDE-acceleration-2021.csv.gz
#> • HG_OOSTENDE-acceleration-2014.csv.gz
#> • HG_OOSTENDE-acceleration-2018.csv.gz
#> • HG_OOSTENDE-acceleration-2019.csv.gz
#> • HG_OOSTENDE-acceleration-2013.csv.gz
#> • HG_OOSTENDE-gps-2014.csv.gz
#> • HG_OOSTENDE-acceleration-2015.csv.gz
#> • HG_OOSTENDE-gps-2015.csv.gz
#> • HG_OOSTENDE-gps-2018.csv.gz
#> • datapackage.json
#> • HG_OOSTENDE-gps-2022.csv.gz
#> • HG_OOSTENDE-gps-2020.csv.gz
#> • HG_OOSTENDE-reference-data.csv
#> For more information, see <https://doi.org/10.5281/zenodo.10054230>.
#> Use `unclass()` to print the Data Package as a list.

read_resource(p, "datapackage.json")
#> Error in `get_schema()` at frictionless-r/R/read_from_path.R:13:3:
#> ! Resource "datapackage.json" must have a profile property with value
#>   "tabular-data-resource".

datapackage_path <- frictionless:::get_resource(p, "datapackage.json")$path
read_package(datapackage_path)
#> A Data Package with 3 resources:
#> • reference-data
#> • gps
#> • acceleration
#> For more information, see <https://doi.org/10.5281/zenodo.10054230>.
#> Use `unclass()` to print the Data Package as a list.
```

It would be nice if `read_package()` could notice this and suggest to the user to read that file instead.

```R
p <- read_package("https://zenodo.org/records/10054230/export/datapackage")
#> ...
#> One of the listed resources is a "datapackage.json" which may describe
#> the resources in more detail. Read it with
#> `read_package("https://zenodo.org/records/10054230/files/datapackage.json")`.
```

---

This is good as a first approach, but it doesn't allow easy programmatic access. Suggestions to do that:

1. An attribute. `NULL` if there is no `datapackage.json` resource:

    ```R
    p1 <- read_package("https://zenodo.org/records/10054230/export/datapackage")
    p1$resource_datapackage_path
    #> "https://zenodo.org/records/10054230/files/datapackage.json"
    p2 <- read_package(p1$resource_datapackage_path)
    ```

2. Piping `read_package()`. If you pass a package to `read_package()` it attempts to read the deeper `datapackage.json` or return the original one if not found:

    ```R
    read_package("https://zenodo.org/records/10054230/export/datapackage") |>
    read_package()
    ```

3. A `merge` parameter that tries to merge the first (metadata) and second (resources) `datapackage.json` files. Note: there is no guarantee that the second one contains better `resources` info and worse metadata, but it is likely for Zenodo deposits.

    ```R
    read_package(
      "https://zenodo.org/records/10054230/export/datapackage",
      merge = TRUE
    )
   ```

It would be good to investigate how other implementations do this. @roll how is this implemented in dpkit and/or Python?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggest to read datapackage.json if it is one of the resources #287

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Suggest to read datapackage.json if it is one of the resources #287

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions