Skip to content

Crash in MDArray API when opening same file from multiple threads #6253

@lnicola

Description

@lnicola

Expected behavior and actual behavior.

The following code crashes with SIGSEGV in libhdf5.

This is using ASAN, but it also happens without it:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==1537122==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000040 (pc 0x7fd4d28fae10 bp 0x7fd4cb9fd73f sp 0x7fd4cb9fd4f8 T2)
==1537122==The signal is caused by a READ memory access.
==1537122==Hint: address points to the zero page.
    #0 0x7fd4d28fae10 in H5F_addr_decode (/usr/lib/libhdf5.so.200+0xfae10)
    #1 0x7fd4d2aeede8 in H5VL__native_blob_specific (/usr/lib/libhdf5.so.200+0x2eede8)
    #2 0x7fd4d2adfb97  (/usr/lib/libhdf5.so.200+0x2dfb97)
    #3 0x7fd4d2ae747c in H5VL_blob_specific (/usr/lib/libhdf5.so.200+0x2e747c)
    #4 0x7fd4d2ad4153  (/usr/lib/libhdf5.so.200+0x2d4153)
    #5 0x7fd4d2a5f4b2 in H5T__conv_vlen (/usr/lib/libhdf5.so.200+0x25f4b2)
    #6 0x7fd4d2a508f0 in H5T_convert (/usr/lib/libhdf5.so.200+0x2508f0)
    #7 0x7fd4d28c7f94 in H5D_get_create_plist (/usr/lib/libhdf5.so.200+0xc7f94)
    #8 0x7fd4d2aef860 in H5VL__native_dataset_get (/usr/lib/libhdf5.so.200+0x2ef860)
    #9 0x7fd4d2ad4d47  (/usr/lib/libhdf5.so.200+0x2d4d47)
    #10 0x7fd4d2adcd31 in H5VL_dataset_get (/usr/lib/libhdf5.so.200+0x2dcd31)
    #11 0x7fd4d28a087c in H5Dget_create_plist (/usr/lib/libhdf5.so.200+0xa087c)
    #12 0x7fd4d2cad1a3 in nc4_get_var_meta (/usr/lib/libnetcdf.so.19+0xad1a3)
    #13 0x7fd4d2cad920 in nc4_hdf5_find_grp_var_att (/usr/lib/libnetcdf.so.19+0xad920)
    #14 0x7fd4d2cb3f37 in NC4_HDF5_inq_var_all (/usr/lib/libnetcdf.so.19+0xb3f37)
    #15 0x7fd4d2c2fc96 in nc_inq_var (/usr/lib/libnetcdf.so.19+0x2fc96)
    #16 0x7fd4d2c2fcd7 in nc_inq_varname (/usr/lib/libnetcdf.so.19+0x2fcd7)
    #17 0x7fd4d447ce1d  (/usr/lib/gdalplugins/gdal_netCDF.so+0x62e1d)
    #18 0x7fd4d4475b63  (/usr/lib/gdalplugins/gdal_netCDF.so+0x5bb63)
    #19 0x7fd4d447c711  (/usr/lib/gdalplugins/gdal_netCDF.so+0x62711)
    #20 0x558a18608657 in main::{lambda()#1}::operator()() const (/home/grayshade/gdal-threads/gdal-threads+0x2657)
    #21 0x558a186094c9 in void std::__invoke_impl<void, main::{lambda()#1}>(std::__invoke_other, main::{lambda()#1}&&) (/home/grayshade/gdal-threads/gdal-threads+0x34c9)
    #22 0x558a1860944f in std::__invoke_result<main::{lambda()#1}>::type std::__invoke<main::{lambda()#1}>(main::{lambda()#1}&&) (/home/grayshade/gdal-threads/gdal-threads+0x344f)
    #23 0x558a186093a9 in void std::thread::_Invoker<std::tuple<main::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (/home/grayshade/gdal-threads/gdal-threads+0x33a9)
    #24 0x558a18609351 in std::thread::_Invoker<std::tuple<main::{lambda()#1}> >::operator()() (/home/grayshade/gdal-threads/gdal-threads+0x3351)
    #25 0x558a18609319 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<main::{lambda()#1}> > >::_M_run() (/home/grayshade/gdal-threads/gdal-threads+0x3319)
    #26 0x7fd4d94d62f2 in execute_native_thread_routine /usr/src/debug/gcc/libstdc++-v3/src/c++11/thread.cc:82
    #27 0x7fd4d929f78c  (/usr/lib/libc.so.6+0x8678c)
    #28 0x7fd4d93208e3 in __clone (/usr/lib/libc.so.6+0x1078e3)

Important note: it only seems to happen when opening the same file. If I make a copy of the .nc under another name and use that in the second thread, it doesn't crash any more.

The 2D API might be affected too, I haven't tried.

Steps to reproduce the problem.

// g++ gdal-threads.cpp -o gdal-threads -pthread -lgdal -fsanitize=address && ./gdal-threads

#include <thread>

#include "gdal_priv.h"

int main()
{
    GDALAllRegister();
    for (int i = 0; i < 1000; i++) {
        std::thread t1([] {
            auto poDataset = std::unique_ptr<GDALDataset>(
                GDALDataset::Open("alldatatypes.nc", GDAL_OF_MULTIDIM_RASTER));
            if (!poDataset) {
                exit(1);
            }
            auto poRootGroup = poDataset->GetRootGroup();
            if (!poRootGroup) {
                exit(1);
            }
            auto poVar = poRootGroup->OpenMDArray("string_var");
            if (!poVar) {
                exit(1);
            }
        });
        std::thread t2([] {
            std::unique_ptr<GDALDataset>(
                GDALDataset::Open("alldatatypes.nc", GDAL_OF_MULTIDIM_RASTER));
        });
        t1.join();
        t2.join();
    }
    return 0;
}

Operating system

Arch Linux x64

GDAL version and provenance

  • gdal 3.5.1
  • netcdf 4.9.0
  • hdf5 1.12.2

All of the above being distro packages. I think Arch carries a couple of patches, but the ones for netcdf don't seem relevant.

alldatatypes.zip

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions