Skip to content

Fix restart-reproducibility tests for ESM models#131

Merged
blimlim merged 3 commits intomainfrom
123-esm-repro-tests
Mar 17, 2025
Merged

Fix restart-reproducibility tests for ESM models#131
blimlim merged 3 commits intomainfrom
123-esm-repro-tests

Conversation

@blimlim
Copy link
Copy Markdown
Collaborator

@blimlim blimlim commented Mar 12, 2025

Closes #123

This PR configures the UM and CICE in ESM1.5/1.6 to write daily restarts, allowing for the restart reproducibility test to be run.

The changes hard code a daily restart frequency regardless of the run length. There is a risk here, if anyone modifies the tests to use a longer runtime (e.g. a month or a year).

Is it acceptable as is? I think it's too complicated to set the dump frequency to exactly match any runtime, as you have to start worrying about calendar types and exact start dates.

An alternative would be to require exactly one of years, month, or seconds is non-zero, and swap the dump frequency to 1 month or 1 day accordingly.

Let me know what sounds best!

@blimlim
Copy link
Copy Markdown
Collaborator Author

blimlim commented Mar 12, 2025

Results from running model-config-tests -m checksum_slow with ESM1.6:

Details
platform linux -- Python 3.10.0, pytest-8.3.2, pluggy-1.5.0
rootdir: /g/data/tm70/sw6175/development/model-config-tests
configfile: pyproject.toml
collected 56 items / 54 deselected / 2 selected                                                                                                                                   

../../../../../../g/data/tm70/sw6175/development/model-config-tests/src/model_config_tests/test_bit_reproducibility.py .F                                                   [100%]

==================================================================================== FAILURES =====================================================================================
____________________________________________________________________ TestBitReproducibility.test_restart_repro ____________________________________________________________________

self = <model_config_tests.test_bit_reproducibility.TestBitReproducibility object at 0x7fa075bb61a0>, output_path = PosixPath('/scratch/tm70/sw6175/tmp/test-model-repro')
control_path = PosixPath('/home/565/sw6175/esm1.6/misc/esm1.6-config-tests')

    @pytest.mark.checksum_slow
    def test_restart_repro(self, output_path: Path, control_path: Path):
        """
        Test that a run reproduces across restarts.
        """
        # First do two short (1 day) runs.
        exp_2x1day = setup_exp(control_path, output_path, "test_restart_repro_2x1day")
    
        # Reconfigure to a 1 day run.
        exp_2x1day.model.set_model_runtime(seconds=DAY_IN_SECONDS)
    
        # Now run twice.
        exp_2x1day.setup_and_run()
        exp_2x1day.force_qsub_run()
    
        # Now do a single 2 day run
        exp_2day = setup_exp(control_path, output_path, "test_restart_repro_2day")
        # Reconfigure
        exp_2day.model.set_model_runtime(seconds=(2 * DAY_IN_SECONDS))
    
        # Run once.
        exp_2day.setup_and_run()
    
        # Now compare the output between our two short and one long run.
        checksums_1d_0 = exp_2x1day.extract_checksums()
        checksums_1d_1 = exp_2x1day.extract_checksums(exp_2x1day.model.output_1)
    
        checksums_2d = exp_2day.extract_checksums()
    
        # Use model specific comparision method for checksums
        model = exp_2day.model
        matching_checksums = model.check_checksums_over_restarts(
            long_run_checksum=checksums_2d,
            short_run_checksum_0=checksums_1d_0,
            short_run_checksum_1=checksums_1d_1,
        )
    
        if not matching_checksums:
            # Write checksums out to file
            with open(output_path / "restart-1d-0-checksum.json", "w") as file:
                json.dump(checksums_1d_0, file, indent=2)
            with open(output_path / "restart-1d-1-checksum.json", "w") as file:
                json.dump(checksums_1d_1, file, indent=2)
            with open(output_path / "restart-2d-0-checksum.json", "w") as file:
                json.dump(checksums_2d, file, indent=2)
    
>       assert matching_checksums
E       assert False

/g/data/tm70/sw6175/development/model-config-tests/src/model_config_tests/test_bit_reproducibility.py:203: AssertionError
------------------------------------------------------------------------------ Captured stdout call -------------------------------------------------------------------------------
['/scratch/tm70/sw6175/tmp/test-model-repro/control/esm1.6-config-tests-test_restart_repro_2x1day/pre-industrial.o136902524']
['/scratch/tm70/sw6175/tmp/test-model-repro/control/esm1.6-config-tests-test_restart_repro_2x1day/pre-industrial.o136902800']
['/scratch/tm70/sw6175/tmp/test-model-repro/control/esm1.6-config-tests-test_restart_repro_2day/pre-industrial.o136903623']
Unequal checksum: Zonal velocity: 1574373519071171406
Unequal checksum: Meridional velocity: 722598503688239145
Unequal checksum: Advection of u: 8362615193283662449
Unequal checksum: Advection of v: -3840660701720284680
Unequal checksum: rho(taup1): 2401763891545041684
Unequal checksum: pressure_at_depth: 6070880160859439371
Unequal checksum: denominator_r: 7450091737817766717
Unequal checksum: drhodT: 2701312768874660084
Unequal checksum: drhodS: -4822446525774232442
Unequal checksum: drhodz_zt: 5085950976692996945
Unequal checksum: dicr: 1090312346215001837
Unequal checksum: dicp: -5321824269304817747
Unequal checksum: caco3: -6134361231664945613
Unequal checksum: alk: -8368577921643854862
Unequal checksum: dic: -519476380649006889
Unequal checksum: no3: 6701846778156609144
Unequal checksum: phy: 5236946551085187543
Unequal checksum: pchl: 2698893877216789189
Unequal checksum: o2: -2145115294525437188
Unequal checksum: fe: 3063103823050146989
Unequal checksum: zoo: -8866613814422840420
Unequal checksum: det: -7607655366414030736
Unequal checksum: phyfe: 9217340607389289179
Unequal checksum: zoofe: 3534789829686046689
Unequal checksum: detfe: 1430636099284394532
Unequal checksum: temp: 3767086323718300294
Unequal checksum: salt: -8055437140346215164
Unequal checksum: age_global: 4470747867136788496
Unequal checksum: frazil: 4161700590726826246
Unequal checksum: pot_temp: 7174402839448829905
Unequal checksum: caco3_sediment: 6121716580077529807
Unequal checksum: detfe_sediment: -4328723381409103678
Unequal checksum: det_sediment: 875609710308425127
Unequal checksum: ending agm_array: 8131481063776659837
Unequal checksum: ending rossby_radius: 5639903349726307679
Unequal checksum: ending rossby_radius_raw: -2411085555606954148
Unequal checksum: ending bih_viscosity: -63908190401646580
Unequal checksum: ending lap_viscosity: -1656448216462778091
Unequal checksum: eta_t: 197811185933503108
Unequal checksum: eta_u: 1384447974225974858
Unequal checksum: deta_dt: 7338091470409331279
Unequal checksum: eta_t_bar: 134505303301198059
Unequal checksum: pbot_t: 107411556291353603
Unequal checksum: pbot_u: 4257244180533446308
Unequal checksum: anompb: -1113665242045343768
Unequal checksum: patm_t: 4059484261468688232
Unequal checksum: dpatm_dt: -1770351501178801822
Unequal checksum: ps: -566564563633297759
Unequal checksum: grad_ps_1: 24970932498429498
Unequal checksum: grad_ps_2: -1586194668715010228
Unequal checksum: udrho: 7633421642495643864
Unequal checksum: vdrho: 2058828527365985300
Unequal checksum: conv_rho_ud_t: 3991559873282954509
Unequal checksum: source: 3314825453874575291
Unequal checksum: eta smoother: -5908546582980200517
Unequal checksum: forcing_u_bt: -795885718342032438
Unequal checksum: forcing_v_bt: 253967514444989018
Unequal checksum: Thickness%rho_dzt(taup1): 3660438857637561252
Unequal checksum: Thickness%rho_dzu(taup1): 8414569972705524846
Unequal checksum: Thickness%mass_u(taup1): -2669828943362333950
Unequal checksum: Thickness%rho_dzten(1): -8973201281140421437
Unequal checksum: Thickness%rho_dzten(2): 236479310108658834
Unequal checksum: Thickness%rho_dztr: 1313409067986619280
Unequal checksum: Thickness%rho_dzur: 5285698540652554639
Unequal checksum: Thickness%rho_dzt_tendency: 5330223468144886566
Unequal checksum: Thickness%dzt: -2316275764445634383
Unequal checksum: Thickness%dzten(1): 3456719820405900464
Unequal checksum: Thickness%dzten(2): -5776122956501557575
Unequal checksum: Thickness%dztlo: -6219168616239235764
Unequal checksum: Thickness%dztup: -5411069186828556527
Unequal checksum: Thickness%dzt_dst: -575244701768800747
Unequal checksum: Thickness%dzwt(k=0): -7528787488243122926
Unequal checksum: Thickness%dzwt(k=1:nk): -1463409952841687785
Unequal checksum: Thickness%dzu: 3950748096992917289
Unequal checksum: Thickness%dzwu(k=0): 253438666577009791
Unequal checksum: Thickness%dzwu(k=1:nk): 1118627505260272380
Unequal checksum: Thickness%depth_zt: -1512133803888684520
Unequal checksum: Thickness%geodepth_zt: 2666635070415010747
Unequal checksum: Thickness%depth_zu: 8492211902074149411
Unequal checksum: Thickness%depth_zwt: -5263883993751330618
Unequal checksum: Thickness%depth_zwu: -8994706235908302316
Unequal checksum: Thickness%mass_en(1): 2533491913299884235
Unequal checksum: Thickness%mass_en(2): 2980894429900863284

The restart reproducibility test fails. Based on earlier comments from @MartinDix, I think this is to be expected due to the cable implementation.

@blimlim blimlim marked this pull request as draft March 13, 2025 00:08
@blimlim blimlim requested a review from dougiesquire March 13, 2025 00:09
@blimlim
Copy link
Copy Markdown
Collaborator Author

blimlim commented Mar 13, 2025

Hi @dougiesquire, I'm getting failing CI tests with the changes

I think it's because we don't have a configuration in the resources directory. Is out preference to switch off these tests for ESM1.6 while the configuration is still in rapid development?

Just realised I need to update the ESM1.5 config

@dougiesquire
Copy link
Copy Markdown
Collaborator

Hi @dougiesquire, I'm getting failing CI tests with the changes

I think it's because we don't have a configuration in the resources directory. Is out preference to switch off these tests for ESM1.6 while the configuration is still in rapid development?

I think it's failing on the ESM1.5 tests because we don't have an example atmosphere/namelists in the test configuration in resources/access/configurations/release-preindustrial+concentrations. Can you copy it in?

@dougiesquire
Copy link
Copy Markdown
Collaborator

I think we need to turn off two of the test parameterizations since they no longer make sense. Mind if I push to this branch?

@blimlim
Copy link
Copy Markdown
Collaborator Author

blimlim commented Mar 13, 2025

Yeah go for it!

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.18%. Comparing base (b5e995f) to head (4c4de2d).
Report is 17 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #131      +/-   ##
==========================================
- Coverage   75.21%   75.18%   -0.03%     
==========================================
  Files          18       18              
  Lines         916      927      +11     
==========================================
+ Hits          689      697       +8     
- Misses        227      230       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@dougiesquire dougiesquire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks @blimlim!

@blimlim blimlim marked this pull request as ready for review March 13, 2025 00:59
@blimlim
Copy link
Copy Markdown
Collaborator Author

blimlim commented Mar 13, 2025

Thanks @dougiesquire. I'd just had the PR as a draft while the tests were breaking.Marked it as ready for review – are you happy to re-review?

@blimlim blimlim requested a review from dougiesquire March 13, 2025 01:00
Copy link
Copy Markdown
Collaborator

@dougiesquire dougiesquire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks even better this time!

@blimlim
Copy link
Copy Markdown
Collaborator Author

blimlim commented Mar 13, 2025

Thanks @dougiesquire! ... looks like it still needs an additional review from someone else before merging. @jo-basevi would you be happy to have a look at these changes?

@blimlim blimlim requested a review from jo-basevi March 13, 2025 01:07
@CodeGat CodeGat self-requested a review March 17, 2025 01:57
@blimlim
Copy link
Copy Markdown
Collaborator Author

blimlim commented Mar 17, 2025

Thanks @CodeGat!

@blimlim blimlim merged commit 5c12d7c into main Mar 17, 2025
7 of 8 checks passed
@blimlim blimlim deleted the 123-esm-repro-tests branch March 17, 2025 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ACCESS-ESM restart repro tests don't work

3 participants