Skip to content

nc_put_var_double execution time increases in subsequent runs when a variable is written with chunking and compression #2750

@abhibaruah

Description

@abhibaruah

NetCDF version: 4.9.1
HDF5 version: 1.10.10
Platform: Windows 11

I have a repro code where I am creating a file, creating an NC_DOUBLE variable and writing a 2000 x 512 x 512 variable to it (with chunk sizes of 20 x 10 x 10 and deflate compression of level 5). I run this code in a for loop of 10 iterations and at the end of every loop, delete the created nc file.

I time the call to nc_put_var_double and I observe that its execution time increases for every subsequent iteration. This happens only on Windows and only when I write using both chunking and compression.
I see similar behavior with 'nc_get_var_int' as well.

Here is the output from the program:

In main
Index: 0
In execution
### Execution time is 26.4505
Index: 1
In execution
### Execution time is 30.2897
Index: 2
In execution
### Execution time is 40.5599
Index: 3
In execution
### Execution time is 48.5356
Index: 4
In execution
### Execution time is 51.8821
Index: 5
In execution
### Execution time is 55.844
Index: 6
In execution
### Execution time is 60.3397
Index: 7
In execution
### Execution time is 62.5748
Index: 8
In execution
### Execution time is 67.3835
Index: 9
In execution
### Execution time is 72.4015

Here is my reproduction code:

#include <stdio.h>
#include <string>
#include <netcdf.h>
#include <chrono>
#include <iostream>

/* NetCDF file names */
#define FILE_NAME "sample_xyz.nc"


/* Test with 3D data */
#define NDIMS 3
#define NX 2000
#define NY 512
#define NZ 512

#define CHUNKX 20
#define CHUNKY 10
#define CHUNKZ 10

#define ERRCODE 2
#define ERR(e) {printf("Error: %s\n", nc_strerror(e)); exit(ERRCODE);}

using namespace std;
using namespace std::chrono;

void execution(double *arr)
{
    std::cout << "In execution" << std::endl;
    int status;
    int ncid;
    int varid;
    int dimids[NDIMS];
    int x_dimid, y_dimid, z_dimid;
    int retval;
    const size_t chunksize[NDIMS] = { CHUNKX, CHUNKY, CHUNKZ };

    

    /* Create a NetCDF4 file. */
    if ((retval = nc_create(FILE_NAME, NC_NETCDF4, &ncid)))
        ERR(retval);
    if (retval != NC_NOERR) {
        printf("Error creating .NC file.\n");
    }

    /* Define dimensions */
    if ((retval = nc_def_dim(ncid, "dim_512", NY, &y_dimid)))
        ERR(retval);
    if ((retval = nc_def_dim(ncid, "dim_2000", NX, &x_dimid)))
        ERR(retval);

    dimids[0] = x_dimid;
    dimids[1] = y_dimid;
    dimids[2] = y_dimid;

    /* Define the variable */
    if ((retval = nc_def_var(ncid, "data", NC_DOUBLE, NDIMS,
        dimids, &varid)))
        ERR(retval);
    
    if ((retval = nc_def_var_chunking(ncid, varid, NC_CHUNKED, chunksize)))
        ERR(retval);
        
    
    if ((retval = nc_def_var_deflate(ncid, varid, 0, 1, 5)))
        ERR(retval);
    

    std::chrono::time_point<std::chrono::high_resolution_clock> timestart, timeend;
    timestart = std::chrono::high_resolution_clock::now();
    retval = nc_put_var_double(ncid, varid, arr);
    timeend = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> duration = timeend - timestart;
    double durationInSeconds = duration.count();
    std::cout << "### Execution time is " << durationInSeconds << std::endl;

    /* close the file */
    status = nc_close(ncid);

    remove(FILE_NAME);

}

int main() {

    // Dynamically allocate memory for the 3D array
    double* arr = new double[NX * NY * NZ];
    int index = 0;
    // Traverse the 3D array
    for (int i = 0; i < NX; i++) {
        for (int j = 0; j < NY; j++) {
            for (int k = 0; k < NZ; k++) {
                *(arr + index) = 5;
                index++;

            }
        }
    }

    std::cout << "In main" << std::endl;

    for (int i = 0; i < 10; i++)
    {
        std::cout << "Index: " << i << std::endl;
        execution(arr);
    }

delete[] arr;
    
}

Let me know if I am doing anything wrong or if any additional information is needed from my side.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions