Zenodo now offers Data Package as an export format for the metadata (e.g. https://zenodo.org/records/10054230/export/datapackage). It includes the deposit metadata (contributors, license, etc.) and all files as resources. These resources are generic (with name, path, format, mimetype, bytes, hash): they are not specified as tabular (even if they are) and do not contain a schema.
For deposits that have a datapackage.json file, one of the resources listed will be that datapackage.json:
library(frictionless)
(p <- read_package("https://zenodo.org/records/10054230/export/datapackage"))
#> A Data Package with 22 resources:
#> • HG_OOSTENDE-acceleration-2017.csv.gz
#> • HG_OOSTENDE-gps-2013.csv.gz
#> • HG_OOSTENDE-gps-2019.csv.gz
#> • HG_OOSTENDE-gps-2021.csv.gz
#> • HG_OOSTENDE-acceleration-2016.csv.gz
#> • HG_OOSTENDE-gps-2017.csv.gz
#> • HG_OOSTENDE-gps-2016.csv.gz
#> • HG_OOSTENDE-acceleration-2022.csv.gz
#> • HG_OOSTENDE-acceleration-2020.csv.gz
#> • HG_OOSTENDE-acceleration-2021.csv.gz
#> • HG_OOSTENDE-acceleration-2014.csv.gz
#> • HG_OOSTENDE-acceleration-2018.csv.gz
#> • HG_OOSTENDE-acceleration-2019.csv.gz
#> • HG_OOSTENDE-acceleration-2013.csv.gz
#> • HG_OOSTENDE-gps-2014.csv.gz
#> • HG_OOSTENDE-acceleration-2015.csv.gz
#> • HG_OOSTENDE-gps-2015.csv.gz
#> • HG_OOSTENDE-gps-2018.csv.gz
#> • datapackage.json
#> • HG_OOSTENDE-gps-2022.csv.gz
#> • HG_OOSTENDE-gps-2020.csv.gz
#> • HG_OOSTENDE-reference-data.csv
#> For more information, see <https://doi.org/10.5281/zenodo.10054230>.
#> Use `unclass()` to print the Data Package as a list.
read_resource(p, "datapackage.json")
#> Error in `get_schema()` at frictionless-r/R/read_from_path.R:13:3:
#> ! Resource "datapackage.json" must have a profile property with value
#> "tabular-data-resource".
datapackage_path <- frictionless:::get_resource(p, "datapackage.json")$path
read_package(datapackage_path)
#> A Data Package with 3 resources:
#> • reference-data
#> • gps
#> • acceleration
#> For more information, see <https://doi.org/10.5281/zenodo.10054230>.
#> Use `unclass()` to print the Data Package as a list.
It would be nice if read_package() could notice this and suggest to the user to read that file instead.
p <- read_package("https://zenodo.org/records/10054230/export/datapackage")
#> ...
#> One of the listed resources is a "datapackage.json" which may describe
#> the resources in more detail. Read it with
#> `read_package("https://zenodo.org/records/10054230/files/datapackage.json")`.
This is good as a first approach, but it doesn't allow easy programmatic access. Suggestions to do that:
-
An attribute. NULL if there is no datapackage.json resource:
p1 <- read_package("https://zenodo.org/records/10054230/export/datapackage")
p1$resource_datapackage_path
#> "https://zenodo.org/records/10054230/files/datapackage.json"
p2 <- read_package(p1$resource_datapackage_path)
-
Piping read_package(). If you pass a package to read_package() it attempts to read the deeper datapackage.json or return the original one if not found:
read_package("https://zenodo.org/records/10054230/export/datapackage") |>
read_package()
-
A merge parameter that tries to merge the first (metadata) and second (resources) datapackage.json files. Note: there is no guarantee that the second one contains better resources info and worse metadata, but it is likely for Zenodo deposits.
read_package(
"https://zenodo.org/records/10054230/export/datapackage",
merge = TRUE
)
It would be good to investigate how other implementations do this. @roll how is this implemented in dpkit and/or Python?
Zenodo now offers Data Package as an export format for the metadata (e.g. https://zenodo.org/records/10054230/export/datapackage). It includes the deposit metadata (contributors, license, etc.) and all files as resources. These resources are generic (with name, path, format, mimetype, bytes, hash): they are not specified as tabular (even if they are) and do not contain a schema.
For deposits that have a
datapackage.jsonfile, one of the resources listed will be thatdatapackage.json:It would be nice if
read_package()could notice this and suggest to the user to read that file instead.This is good as a first approach, but it doesn't allow easy programmatic access. Suggestions to do that:
An attribute.
NULLif there is nodatapackage.jsonresource:Piping
read_package(). If you pass a package toread_package()it attempts to read the deeperdatapackage.jsonor return the original one if not found:A
mergeparameter that tries to merge the first (metadata) and second (resources)datapackage.jsonfiles. Note: there is no guarantee that the second one contains betterresourcesinfo and worse metadata, but it is likely for Zenodo deposits.It would be good to investigate how other implementations do this. @roll how is this implemented in dpkit and/or Python?