When MOM6 is run through the NUOPC driver with PARALLEL_RESTARTFILES = True, it successfully writes the per-rank
restart slices (this will be enabled after payu-org/payu#601 gets merged),
access-om3.mom6.r.1900-01-02-00000.nc.0000
access-om3.mom6.r.1900-01-02-00000.nc.0001
...
but rpointer.ocn contains only the basename, eg, access-om3.mom6.r.1900-01-02-00000.nc.
On the next run MOM6/NUOPC cannot locate the restart ensemble and fails with,
WARNING: MOM_restart: Unable to find restart file : ...nc.nc
FATAL : MOM_restart: Unable to find any restart files specified by ...
Manual edits of rpointer.ocn to enumerate the .nc.000? files avoid the first fatal, but start up then crashes with,
NetCDF: Index exceeds dimension bound (variable: Temp)
This is because each slice is opened as a single file, hence each rank thinks the file holds the whole grid and tries to read beyond its local dimenions.
The current fix is to keep the basename in rpointer.ocn, but let MOM open it in decomposed mode then each rank can safely read its own piece. I'll wrap it up in a following PR for this fix.
More discussions can be found: ACCESS-NRI/access-om3-configs#592, ACCESS-NRI/access-om3-configs#637, payu-org/payu#601, payu-org/payu#600
When MOM6 is run through the NUOPC driver with
PARALLEL_RESTARTFILES = True, it successfully writes the per-rankrestart slices (this will be enabled after payu-org/payu#601 gets merged),
but
rpointer.ocncontains only the basename, eg,access-om3.mom6.r.1900-01-02-00000.nc.On the next run MOM6/NUOPC cannot locate the restart ensemble and fails with,
Manual edits of
rpointer.ocnto enumerate the .nc.000? files avoid the first fatal, but start up then crashes with,This is because each slice is opened as a single file, hence each rank thinks the file holds the whole grid and tries to read beyond its local dimenions.
The current fix is to keep the basename in
rpointer.ocn, but let MOM open it in decomposed mode then each rank can safely read its own piece. I'll wrap it up in a following PR for this fix.More discussions can be found: ACCESS-NRI/access-om3-configs#592, ACCESS-NRI/access-om3-configs#637, payu-org/payu#601, payu-org/payu#600