reverting paramsheet to profiles#644
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
There was a problem hiding this comment.
Pull request overview
This PR implements the dual-mode configuration approach from #472 by making profiles the default (single-run) mechanism and only using paramsheets for explicit multi-run benchmarking. It updates pipeline configuration resolution, adds analysis profiles, and refactors tests/docs accordingly.
Changes:
- Switch paramset resolution to use paramsheet only when
--paramsheetis provided, otherwise run in profile (single-run) mode. - Add/restore analysis profiles (rnaseq/affy/maxquant/soft and variants) via new
conf/**.configfiles andnextflow.configprofile entries. - Update docs and tests to reflect the two-mode architecture; remove the shipped default paramsheet.
Reviewed changes
Copilot reviewed 39 out of 42 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_soft.nf.test.snap | Snapshot updated for soft profile output paths and meta. |
| tests/test_soft.nf.test | Test now loads conf/soft/soft.config and sets explicit inputs instead of paramsheet. |
| tests/test_rnaseq_limma.nf.test | Tests now use rnaseq limma profile configs and explicit inputs (no paramsheet). |
| tests/test_rnaseq_dream.nf.test.snap | Snapshot updated for rnaseq dream output path naming and meta. |
| tests/test_rnaseq_dream.nf.test | Tests now use rnaseq dream profile configs and explicit inputs/overrides. |
| tests/test_rnaseq_deseq2.nf.test | Tests now use rnaseq DESeq2 profile configs and explicit inputs/overrides. |
| tests/test_nogtf.nf.test.snap | Snapshot updated for rnaseq “no gtf” output paths and meta. |
| tests/test_nogtf.nf.test | Test now loads conf/rnaseq/rnaseq.config and sets explicit inputs. |
| tests/test_maxquant.nf.test.snap | Snapshot updated for maxquant output paths and meta. |
| tests/test_maxquant.nf.test | Test now loads conf/maxquant/maxquant.config and sets explicit inputs. |
| tests/test_many.nf.test.snap | Snapshot updated for multi-run test paramset naming and meta. |
| tests/test_many.nf.test | Multi-run test still uses paramsheet mode but with renamed test paramsets. |
| tests/test_affy.nf.test | Tests now load affy profile configs and set explicit inputs/overrides. |
| tests/default.nf.test | “with formula” test now uses conf/test.config and explicit overrides. |
| subworkflows/local/utils_nfcore_differentialabundance_pipeline/main.nf | Paramset resolution now switches modes based on params.paramsheet; default config uses profile paramset_name. |
| nextflow_schema.json | Schema text updated to explain two-mode behavior; paramsheet default removed. |
| nextflow.config | params.paramsheet default set to null; adds analysis profiles under profiles {}. |
| docs/usage.md | Documents single-run vs multi-run modes and precedence; updates examples/profiles listing. |
| conf/testdata.yaml | Removed legacy testdata paramsets file. |
| conf/test_paramsheet.yaml | Refactored to be a self-contained test paramsheet (data/config/test blocks). |
| conf/test.config | Now includes rnaseq DESeq2+GSEA config instead of setting paramset_name directly. |
| conf/soft/soft.config | New soft analysis profile config. |
| conf/rnaseq/rnaseq_limma_gsea.config | New rnaseq limma+GSEA profile config. |
| conf/rnaseq/rnaseq_limma_gprofiler2.config | New rnaseq limma+gprofiler2 profile config. |
| conf/rnaseq/rnaseq_limma_decoupler.config | New rnaseq limma+decoupler profile config. |
| conf/rnaseq/rnaseq_limma.config | New rnaseq limma base profile config. |
| conf/rnaseq/rnaseq_dream_decoupler.config | New rnaseq dream+decoupler profile config. |
| conf/rnaseq/rnaseq_dream.config | New rnaseq dream base profile config. |
| conf/rnaseq/rnaseq_deseq2_gsea.config | New rnaseq DESeq2+GSEA profile config. |
| conf/rnaseq/rnaseq_deseq2_gprofiler2.config | New rnaseq DESeq2+gprofiler2 profile config. |
| conf/rnaseq/rnaseq.config | New rnaseq base (DESeq2 default) profile config. |
| conf/paramsheet.yaml | Removed shipped default paramsheet. |
| conf/maxquant/maxquant.config | New maxquant analysis profile config. |
| conf/affy/affy_limma_gsea.config | New affy limma+GSEA profile config. |
| conf/affy/affy_limma_gprofiler2.config | New affy limma+gprofiler2 profile config. |
| conf/affy/affy.config | New affy base profile config. |
| README.md | Updates usage guidance to analysis profiles + optional paramsheet mode. |
| CHANGELOG.md | Adds entry describing the profile-first reversion. |
Comments suppressed due to low confidence (1)
conf/test_paramsheet.yaml:184
affy_testdatapoints to a human GMT (h.all...Hs.symbols.gmt), but the derived test config setsgprofiler2_organism: "mmusculus", which overrides the GMT and queries mouse gene sets. Consider changing this tohsapiens(or omit it) so the test configuration matches the dataset species.
- paramset_name: test_affy_limma_gprofiler2
include: test_paramsheet/affy_limma_gprofiler2,test_paramsheet/affy_testdata
observations_id_col: name
observations_name_col: name
exploratory_main_variable: contrasts
differential_max_pval: 0.05
differential_max_qval: 0.05
differential_min_fold_change: 1.5
gprofiler2_organism: "mmusculus"
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| differential_min_fold_change = 1.5 | ||
|
|
||
| // gprofiler2 | ||
| gprofiler2_organism = 'mmusculus' |
There was a problem hiding this comment.
This test uses a human Hallmark GMT (h.all...Hs.symbols.gmt) but sets gprofiler2_organism = 'mmusculus', which will override the GMT and query mouse gene sets instead. Set this to hsapiens (or leave it unset to keep using the provided human GMT) to avoid silently testing the wrong species.
| gprofiler2_organism = 'mmusculus' | |
| gprofiler2_organism = 'hsapiens' |
There was a problem hiding this comment.
@copilot create a issue to check this out (indeed the GSE50790 study uses human samples)
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@suzannejin are you happy with this? Have you reviewed the AI's work yet? |
grst
left a comment
There was a problem hiding this comment.
The re-added profiles and docs-related changes LGTM.
I think currently the custom logic in getParamsetConfigurations() gets triggered no matter if a paramset is defined or not. I'm wondering if we could get a clearer separation of the "standard" and "custom" logic, by not even invoking that code if no paramsheet is set.
| // Use paramsheet if paramset_name is provided, otherwise use default params | ||
| def paramsets = (params.paramset_name) ? getParamsheetConfigurations() : getDefaultConfigurations() | ||
| // Use paramsheet if --paramsheet is explicitly provided, otherwise use default params (profile mode) | ||
| def paramsets = (params.paramsheet) ? getParamsheetConfigurations() : getDefaultConfigurations() |
There was a problem hiding this comment.
What's the default configuration? Why do we even have to invoke getParamsetConfigurations() when no paramsheet is specified?
There was a problem hiding this comment.
Agree with @grst . The name getParamsetConfigurations() implies paramsheet logic, but in profile mode it just wraps params into a list. The calling code could handle the branching directly:
def configurations = params.paramsheet
? getParamsheetConfigurations()
: getDefaultConfigurations()...with the shared validation/cleanup (the paramsets.collect { ... } block) extracted to something like validateConfigurations(paramsets). That way the paramsheet code path is never entered when it's not needed, and the two modes are structurally separated rather than just branched inside a single function.
Not blocking, but it would make the intent clearer for future maintainers.
Hi @pinin4fjords , it is ready for me. |
|
|
||
| // Analysis profiles (single-run mode) | ||
| rnaseq { includeConfig 'conf/rnaseq/rnaseq.config' } | ||
| rnaseq_nogtf { includeConfig 'conf/rnaseq/rnaseq_nogtf.config' } |
| // Use paramsheet if paramset_name is provided, otherwise use default params | ||
| def paramsets = (params.paramset_name) ? getParamsheetConfigurations() : getDefaultConfigurations() | ||
| // Use paramsheet if --paramsheet is explicitly provided, otherwise use default params (profile mode) | ||
| def paramsets = (params.paramsheet) ? getParamsheetConfigurations() : getDefaultConfigurations() |
There was a problem hiding this comment.
Agree with @grst . The name getParamsetConfigurations() implies paramsheet logic, but in profile mode it just wraps params into a list. The calling code could handle the branching directly:
def configurations = params.paramsheet
? getParamsheetConfigurations()
: getDefaultConfigurations()...with the shared validation/cleanup (the paramsets.collect { ... } block) extracted to something like validateConfigurations(paramsets). That way the paramsheet code path is never entered when it's not needed, and the two modes are structurally separated rather than just branched inside a single function.
Not blocking, but it would make the intent clearer for future maintainers.
| observations_name_col = 'sample' | ||
|
|
||
| // Differential analysis (DESeq2) | ||
| differential_method = 'deseq2' |
There was a problem hiding this comment.
We should probably document that you should now change method via profile usage, since something like this would break things due to column mismatches:
-profile rnaseq --differential_method limma
| outdir = "$outputDir" | ||
|
|
||
| // Input data (rnaseq_testdata_complex) | ||
| study_name = 'SRP254919' |
There was a problem hiding this comment.
These test data items are duplicated extensively across the test files. The rnaseq test data (SRP254919.samplesheet.csv, SRP254919.salmon.merged...tsv, etc.) appears 10 times across test_rnaseq_deseq2.nf.test (4 cases), test_rnaseq_limma.nf.test (4 cases), test_rnaseq_dream.nf.test (1 case), and test_nogtf.nf.test (1 case). Before this PR these were centralized in testdata.yaml.
I think we could have e.g. a conf/testdata/rnaseq.config with the shared test data params, which a conf/test/rnaseq_limma.config could include alongside the analysis config, and then test_rnaseq_limma.nf.test would use that as its nf-test config. That way each test only needs to specify its case-specific overrides, and a URL change is a one-file fix.
|
@suzannejin I've opened a new pull request, #654, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
@pinin4fjords I have implemented what you suggested here, is this what you meant? |
Yep, basically! |
pinin4fjords
left a comment
There was a problem hiding this comment.
I'm going to approve this to not block you. FYI I'm on leave next week.
|
@pinin4fjords thanks a lot :) |
Solves #472