A (modified to run on 4-nodes) AMIP config run got killed overnight with the following error message:
laboratory path: /scratch/tm70/ms2335/access-esm
binary path: /scratch/tm70/ms2335/access-esm/bin
input path: /scratch/tm70/ms2335/access-esm/input
work path: /scratch/tm70/ms2335/access-esm/work
archive path: /scratch/tm70/ms2335/access-esm/archive
Found experiment archive: /scratch/tm70/ms2335/access-esm/archive/20260123-dev-amip-no-map-by-numa-dev-amip-7b152fbd
nruns: 2 nruns_per_submit: 1 subrun: 1
payu: Found modules in /opt/Modules/v4.3.0
Loading input manifest: manifests/input.yaml
Loading restart manifest: manifests/restart.yaml
Loading exe manifest: manifests/exe.yaml
Setting up atmosphere
Checking exe, input and restart manifests
Updating land use for year 1979
Job 159069506.gadi-pbs killed due to exceeding jobfs quota. Quota: 375.0MB, Used: 1.0GB, Host: gadi-cpu-spr-0145
======================================================================================
Resource Usage on 2026-01-23 20:12:22:
Job Id: 159069506.gadi-pbs
Project: tm70
Exit Status: 271 (Linux Signal 15 SIGTERM Termination)
Service Units: 653.12
NCPUs Requested: 416 CPU Time Used: 310:10:34
Memory Requested: 2.0TB Memory Used: 130.48GB
Walltime Requested: 06:30:00 Walltime Used: 00:47:06
JobFS Requested: 1.46GB JobFS Used: 1.0GB
======================================================================================
The released AMIP-config with 2-nodes is (presumably) not affected - noting mostly for myself that if we increase the number of nodes for the AMIP config.
The error is also stochastic - I have successfully finished quite a few other 4-node runs (and not really sure how the jobfs requirement can be stochastic).
The no-risk change would be to increase the current (very low) jobfs quota to something reasonable and avoid such an issue, esp. if the issue occurs randomly.
A (modified to run on 4-nodes) AMIP config run got killed overnight with the following error message:
The released AMIP-config with 2-nodes is (presumably) not affected - noting mostly for myself that if we increase the number of nodes for the AMIP config.
The error is also stochastic - I have successfully finished quite a few other 4-node runs (and not really sure how the jobfs requirement can be stochastic).
The no-risk change would be to increase the current (very low) jobfs quota to something reasonable and avoid such an issue, esp. if the issue occurs randomly.