Skip to content

Improve caching behaviour for processes usingVALIDATOR outputs#619

Merged
atrigila merged 4 commits intonf-core:devfrom
atrigila:caching-behaviour
Jan 27, 2026
Merged

Improve caching behaviour for processes usingVALIDATOR outputs#619
atrigila merged 4 commits intonf-core:devfrom
atrigila:caching-behaviour

Conversation

@atrigila
Copy link
Copy Markdown

@atrigila atrigila commented Jan 22, 2026

Closes #520

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/differentialabundance branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nf-test test main.nf.test -profile test,docker).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 22, 2026

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit ab541ec

+| ✅ 379 tests passed       |+
#| ❔  10 tests were ignored |#
#| ❔   1 tests had warnings |#
!| ❗  20 tests had warnings |!
Details

❗ Test warnings:

  • readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
  • pipeline_todos - TODO string in nextflow.config: Update the field with the details of the contributors to your pipeline. New with Nextflow version 24.10.0
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • schema_lint - Parameter input is not defined in the correct subschema (input_output_options)
  • schema_description - No description provided in schema for parameter: deseq2_seed
  • schema_description - No description provided in schema for parameter: dream_p_value
  • schema_description - No description provided in schema for parameter: dream_lfc
  • schema_description - No description provided in schema for parameter: dream_confint
  • schema_description - No description provided in schema for parameter: dream_proportion
  • schema_description - No description provided in schema for parameter: dream_stdev_coef_lim
  • schema_description - No description provided in schema for parameter: dream_trend
  • schema_description - No description provided in schema for parameter: dream_robust
  • schema_description - No description provided in schema for parameter: dream_winsor_tail_p
  • schema_description - No description provided in schema for parameter: dream_ddf
  • schema_description - No description provided in schema for parameter: dream_reml
  • schema_description - No description provided in schema for parameter: dream_apply_voom
  • schema_description - No description provided in schema for parameter: dream_adjust_method

❔ Tests ignored:

  • files_exist - File is ignored: assets/multiqc_config.yml
  • nextflow_config - Config default ignored: params.report_file
  • nextflow_config - Config default ignored: params.logo_file
  • nextflow_config - Config default ignored: params.css_file
  • nextflow_config - Config default ignored: params.citations_file
  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: assets/nf-core-differentialabundance_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-differentialabundance_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-differentialabundance_logo_dark.png
  • multiqc_config - multiqc_config

❔ Tests fixed:

✅ Tests passed:

Run details

  • nf-core/tools version 3.5.1
  • Run at 2026-01-26 18:22:35

@atrigila atrigila marked this pull request as ready for review January 22, 2026 14:47
@atrigila atrigila requested review from grst and pinin4fjords January 22, 2026 14:48
Copy link
Copy Markdown
Member

@grst grst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, and the runtime cost should be minimal since we are not dealing with huge data in differentialabundance.

Still would be curious what @pinin4fjords thinks of this approach:
tl;dr: Any change in the inputs causes VALIDATOR to run and invalidates the cache, even if it doesn't affect any downstream processes. Setting cache = deep on any process that consumes output of VALIDATOR solves this issue. For more details, see #520

Comment thread conf/modules.config Outdated
}

withName: CUSTOM_MATRIXFILTER {
cache = 'deep'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a comment to each of these entries to explain the why.

@pinin4fjords
Copy link
Copy Markdown
Member

pinin4fjords commented Jan 23, 2026

This makes me a little nervous, we're assuming a lot about the size of the inputs future folks will apply, and the scale they'll be running at.

I'd be more comfortable if this was implemented via a profile so folks would do -profile docker,cache_deep or similar.

@grst
Copy link
Copy Markdown
Member

grst commented Jan 23, 2026

What I meant was that we are just operating on a fundamentally different scale in this pipeline, then, say, BAM files.

I can live with the profile, but I think it would be a reasonable default if the caching just worked as expected, especially with large datasets.

@pinin4fjords
Copy link
Copy Markdown
Member

What I meant was that we are just operating on a fundamentally different scale in this pipeline, then, say, BAM files.

I can live with the profile, but I think it would be a reasonable default if the caching just worked as expected, especially with large datasets.

Yes, but we're still significantly ramping up the overheads (even if from a very low base). How about a compromise, have it on by default, but give me a profile to turn it off and reset cache to default? That way the option is handy if the caching causes issues for someone.

@grst
Copy link
Copy Markdown
Member

grst commented Jan 23, 2026

ok, let's do that! Also, if you have any suggestion how to solve this differently, we are happy to give it a try!

@pinin4fjords
Copy link
Copy Markdown
Member

ok, let's do that! Also, if you have any suggestion how to solve this differently, we are happy to give it a try!

I don't have a more cunning plan. The alternative might be to validate different things (matrices, contrasts) separately, but that has its own issues with complexity. This approach is reasonable.

@atrigila
Copy link
Copy Markdown
Author

@pinin4fjords @grst added the default profile and an explanation.

@atrigila atrigila merged commit 9b538e0 into nf-core:dev Jan 27, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants