Describe the bug
I have hit a reproducible error where big arrays (>8.1Gb) are not read correctly, rather with a zero array (rather than real numbers) being returned. I was a little puzzled by this error, and got talking with @painter1 who also had this problem and reported it back via email in May 2019. It turns out that the issue is with arrays greater than 8.1Gb, with the original error a bug with libnetcdf versions for big variables (from @painter1's notes/emails). @dnadeau4 and @doutriaux1 may recall some of the specific details about this. I note I may not be using the latest versions of libraries below.
To Reproduce
Steps to reproduce the behavior:
- Install CDAT with:
cdms2-3.1.4-py37ha6f5e91_3, libnetcdf-4.6.2-h303dfb8_1003, netcdf-fortran-4.4.5-h0789656_1004
- Execute the code attached (which reads larger and larger arrays)
- Watch as some summary stats go from real numbers to 0's when the arrays being read are >8Gb, which for the demo below happens at year 1989 (3rd step of the loop) when 26 years of data are being read (with the model having a vert/horiz grid of 60 vertical levels, 384 lat, 320 lon).
Expected behavior
Big arrays should be read validly, returning non-zero arrays
Desktop (please complete the following information):
The code to reproduce this:
# imports
import sys
import cdat_info
import cdms2 as cdm
import numpy as np
from socket import gethostname
#%% Define function
def calcAve(var):
print('type(var);',type(var),'; var.shape:',var.shape)
# Start querying stat functions
print('var.min():'.ljust(21),var.min())
print('var.max():'.ljust(21),var.max())
print('np.ma.mean(var.data):',np.ma.mean(var.data)) ; # Not mask aware
# Problem transientVariable.mean() function
#print('var.mean():'.ljust(21),var.mean())
print('-----')
#%% Load subset of variable
f = ['/p/css03/esgf_publish/CMIP6/CMIP/NCAR/CESM2/historical/r1i1p1f1/Omon/so/gn/v20190308/so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc']
# Try building up arrays stepping in a single year
times = np.arange(1991,1984,-1)
print('host:',gethostname())
print('Python version:',sys.version)
print('cdat env:',sys.executable.split('/')[5])
print('cdat version:',cdat_info.version()[0])
print('*****')
for timeSlot in times:
for filePath in f:
fH = cdm.open(filePath)
print('filePath:',filePath.split('/')[-1])
# Loop through single years
start = timeSlot ; end = 2014
print('times:',start,end,'; total years:',(end-start)+1)
d1 = fH('so',time=(str(start),str(end)))
print("Array size: %d Mb" % ( (d1.size * d1.itemsize) / (1024*1024) ) )
calcAve(d1)
del(d1)
fH.close()
print('----- -----')
@pochedls @muryanto1 @downiec @jasonb5 @gabdulla @gleckler1 @lee1043 ping
Describe the bug
I have hit a reproducible error where big arrays (>8.1Gb) are not read correctly, rather with a zero array (rather than real numbers) being returned. I was a little puzzled by this error, and got talking with @painter1 who also had this problem and reported it back via email in May 2019. It turns out that the issue is with arrays greater than 8.1Gb, with the original error a bug with libnetcdf versions for big variables (from @painter1's notes/emails). @dnadeau4 and @doutriaux1 may recall some of the specific details about this. I note I may not be using the latest versions of libraries below.
To Reproduce
Steps to reproduce the behavior:
cdms2-3.1.4-py37ha6f5e91_3,libnetcdf-4.6.2-h303dfb8_1003,netcdf-fortran-4.4.5-h0789656_1004Expected behavior
Big arrays should be read validly, returning non-zero arrays
Desktop (please complete the following information):
The code to reproduce this:
@pochedls @muryanto1 @downiec @jasonb5 @gabdulla @gleckler1 @lee1043 ping