Skip to content

Problems with reading "big" arrays (>8.1Gb) #383

@durack1

Description

@durack1

Describe the bug
I have hit a reproducible error where big arrays (>8.1Gb) are not read correctly, rather with a zero array (rather than real numbers) being returned. I was a little puzzled by this error, and got talking with @painter1 who also had this problem and reported it back via email in May 2019. It turns out that the issue is with arrays greater than 8.1Gb, with the original error a bug with libnetcdf versions for big variables (from @painter1's notes/emails). @dnadeau4 and @doutriaux1 may recall some of the specific details about this. I note I may not be using the latest versions of libraries below.

To Reproduce
Steps to reproduce the behavior:

  1. Install CDAT with: cdms2-3.1.4-py37ha6f5e91_3, libnetcdf-4.6.2-h303dfb8_1003, netcdf-fortran-4.4.5-h0789656_1004
  2. Execute the code attached (which reads larger and larger arrays)
  3. Watch as some summary stats go from real numbers to 0's when the arrays being read are >8Gb, which for the demo below happens at year 1989 (3rd step of the loop) when 26 years of data are being read (with the model having a vert/horiz grid of 60 vertical levels, 384 lat, 320 lon).

Expected behavior
Big arrays should be read validly, returning non-zero arrays

Desktop (please complete the following information):

  • OS: RHEL7.x

The code to reproduce this:

# imports
import sys
import cdat_info
import cdms2 as cdm
import numpy as np
from socket import gethostname

#%% Define function
def calcAve(var):
    print('type(var);',type(var),'; var.shape:',var.shape)
    # Start querying stat functions
    print('var.min():'.ljust(21),var.min())
    print('var.max():'.ljust(21),var.max())
    print('np.ma.mean(var.data):',np.ma.mean(var.data)) ; # Not mask aware
    # Problem transientVariable.mean() function
    #print('var.mean():'.ljust(21),var.mean())
    print('-----')

#%% Load subset of variable
f = ['/p/css03/esgf_publish/CMIP6/CMIP/NCAR/CESM2/historical/r1i1p1f1/Omon/so/gn/v20190308/so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc']
# Try building up arrays stepping in a single year
times = np.arange(1991,1984,-1)
print('host:',gethostname())
print('Python version:',sys.version)
print('cdat env:',sys.executable.split('/')[5])
print('cdat version:',cdat_info.version()[0])
print('*****')
for timeSlot in times:
    for filePath in f:
        fH = cdm.open(filePath)
        print('filePath:',filePath.split('/')[-1])
        # Loop through single years
        start = timeSlot ; end = 2014
        print('times:',start,end,'; total years:',(end-start)+1)
        d1 = fH('so',time=(str(start),str(end)))
        print("Array size: %d Mb" % ( (d1.size * d1.itemsize) / (1024*1024) ) )
        calcAve(d1)
        del(d1)
        fH.close()
    print('----- -----')

@pochedls @muryanto1 @downiec @jasonb5 @gabdulla @gleckler1 @lee1043 ping

Metadata

Metadata

Assignees

No one assigned

    Labels

    pending-releaseFix is included in a pending release of CDAT metapackage.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions