Skip to content

Bad parsing of Zarr metadata in case empty variable (v. 4.9.2) #2748

@minomicetto

Description

@minomicetto

I was experiencing a strange error due to the presence of an empty variable in input zarr dataset.
You can reproduce it just try to access any/most of daily datasets with the variable "tasmin" from https://cmip6.storage.googleapis.com/pangeo-cmip6.csv via s3, for instance

gs://cmip6/CMIP6/CMIP/NASA-GISS/GISS-E2-1-G/historical/r1i1p1f1/day/tasmin/gn/v20181015/
gs://cmip6/CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/day/tasmin/gn/v20181218/
gs://cmip6/CMIP6/CMIP/BCC/BCC-ESM1/historical/r1i1p1f1/day/tasmin/gn/v20181220/
gs://cmip6/CMIP6/CMIP/SNU/SAM0-UNICON/historical/r1i1p1f1/day/tasmin/gn/v20190323/
gs://cmip6/CMIP6/CMIP/CCCma/CanESM5/historical/r1i1p1f1/day/tasmin/gn/v20190429/
gs://cmip6/CMIP6/CMIP/MRI/MRI-ESM2-0/historical/r1i1p1f1/day/tasmin/gn/v20190603/
gs://cmip6/CMIP6/CMIP/HAMMOZ-Consortium/MPI-ESM-1-2-HAM/historical/r1i1p1f1/day/tasmin/gn/v20190627/
gs://cmip6/CMIP6/CMIP/MPI-M/MPI-ESM1-2-LR/historical/r1i1p1f1/day/tasmin/gn/v20190710/
gs://cmip6/CMIP6/CMIP/MPI-M/MPI-ESM1-2-HR/historical/r1i1p1f1/day/tasmin/gn/v20190710/
gs://cmip6/CMIP6/CMIP/NUIST/NESM3/historical/r1i1p1f1/day/tasmin/gn/v20190812/
gs://cmip6/CMIP6/CMIP/NCC/NorESM2-LM/historical/r1i1p1f1/day/tasmin/gn/v20190815/
gs://cmip6/CMIP6/CMIP/CAS/FGOALS-g3/historical/r1i1p1f1/day/tasmin/gn/v20190826/
gs://cmip6/CMIP6/CMIP/MIROC/MIROC6/historical/r1i1p1f1/day/tasmin/gn/v20191016/
gs://cmip6/CMIP6/CMIP/CSIRO-ARCCSS/ACCESS-CM2/historical/r1i1p1f1/day/tasmin/gn/v20191108/
gs://cmip6/CMIP6/CMIP/NCC/NorESM2-MM/historical/r1i1p1f1/day/tasmin/gn/v20191108/
gs://cmip6/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r1i1p1f1/day/tasmin/gn/v20191115/
gs://cmip6/CMIP6/CMIP/AWI/AWI-ESM-1-1-LR/historical/r1i1p1f1/day/tasmin/gn/v20200212/
gs://cmip6/CMIP6/CMIP/AS-RCEC/TaiESM1/historical/r1i1p1f1/day/tasmin/gn/v20200626/
gs://cmip6/CMIP6/CMIP/NCC/NorCPM1/historical/r1i1p1f1/day/tasmin/gn/v20200724/
gs://cmip6/CMIP6/CMIP/CMCC/CMCC-ESM2/historical/r1i1p1f1/day/tasmin/gn/v20210114/

and consider the empty variable "height". See, for example, https://storage.googleapis.com/cmip6/CMIP6/CMIP/NASA-GISS/GISS-E2-1-G/historical/r1i1p1f1/day/tasmin/gn/v20181015/height/.zarray for further information regarding the empty variable "height".

More in detail, a NC_ENCZARR is returned when the nc library checks the variable "zarr_rank" to be equal to 0 at
https://github.com/Unidata/netcdf-c/blob/main/libnczarr/zsync.c#L1668
as, I believe, it assumes the variable "height" is not scalar (it has not "shape" nor "chunk").
Actually, the nc library can detect this just at https://github.com/Unidata/netcdf-c/blob/main/libnczarr/zsync.c#L1608

Of course, the best solution is to remove any references to "height" from input dataset as they are useless but, in general, empty variables are not uncommon, even in NetCDF files.
A naive solution I adopted is to change https://github.com/Unidata/netcdf-c/blob/main/libnczarr/zsync.c#L1659 from

if(zvar->scalar) {

to

if(zvar->scalar || !zarr_rank) {

so that the empty variable "height" is considered scalar and, then, skipped (I believe)... but I don't know what bad consequences this modification has.
Could you provide a smarter solution to handle this kind of empty variables in general, please?

Bye

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions