Skip to content

Tidy-up batch: docs, schema, configs, single trivial correctness fix #1845

@pinin4fjords

Description

@pinin4fjords

Background

A pipeline-wide review against origin/dev (post-3.26.0, SHA e64c3f753) surfaced a handful of small, no-behaviour-change items that are cheaper to bundle into one PR than to ship separately. None of them affect runtime behaviour or test snapshots; the single correctness fix (&&&) is included here because it's a 3-line trivial change in code paths that already evaluate to the right value (Groovy bitwise-AND on truthy/falsy operands gives the same result as logical-AND).

This is a single-PR, single-CI-run task. Each sub-item below is independently verifiable.

Tasks

1. README QC list — Preseq is not a default; RustQC is missing

README.md lines 46-50 list Preseq inside the "Extensive quality control" section as if it runs by default:

14. Extensive quality control:
    1. [`RSeQC`](http://rseqc.sourceforge.net/)
    2. [`Qualimap`](http://qualimap.bioinfo.cipf.es/)
    3. [`dupRadar`](https://bioconductor.org/packages/release/bioc/html/dupRadar.html)
    4. [`Preseq`](http://smithlabresearch.org/software/preseq/)
    5. [`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html)

But nextflow.config:107 defaults skip_preseq = true, and docs/usage.md:398 correctly says "does not normally run Preseq". RustQC has full sections in docs/usage.md:372-409 and docs/output.md:546+ but is not mentioned in the README at all.

Action: in README.md, mark Preseq as optional (e.g. [Preseq](...) (*disabled by default*, enable with --skip_preseq falseor via--use_rustqc)). Add one bullet pointing at RustQC as a unified replacement for RSeQC/Qualimap/dupRadar/Preseq/SAMtools-stats, e.g. Or use [RustQC](https://github.com/seqeralabs/rustqc) (single-pass replacement for RSeQC/Qualimap/dupRadar/Preseq, enabled with --use_rustqc — see usage docs).

2. Schema: drop TODO comments leaked into user-facing description text

nextflow_schema.json:553 and :559 — the kallisto_quant_fraglen and kallisto_quant_fraglen_sd description fields end with "TODO: use existing RSeQC results to do this dynamically.". This text is rendered verbatim in the schema-generated help.

Action:

  • Strip the TODO: ... sentence from both descriptions.
  • Add a unit clarification — these values are in base pairs.
  • If the TODO is still genuinely on someone's roadmap, open a tracking issue and link it from a code comment in workflows/rnaseq/main.nf near the kallisto invocation; do not leave it in user-facing schema text.

3. Schema: declare "default": false for skip_* params that are missing it

In the QC skip block of nextflow_schema.json, only skip_preseq (line 800) declares "default": true. The others default to false in nextflow.config but the schema doesn't say so:

  • skip_dupradar (line 803) — nextflow.config:108 defaults to false
  • skip_qualimap (line 808) — nextflow.config:109 defaults to false
  • skip_rseqc (line 813) — nextflow.config:118 defaults to false
  • skip_biotype_qc (line 818) — verify default in nextflow.config
  • skip_deseq2_qc (line 823) — verify default in nextflow.config

Action: add "default": false to each of the five entries to match nextflow.config. While there, append a note to each description that the param has no effect when --use_rustqc is enabled (RustQC subsumes these tools).

4. workflows/rnaseq/main.nf:429,459,782&&&

Three guard conditions use bitwise AND instead of logical AND:

if (!params.skip_qc & !params.skip_deseq2_qc & !params.skip_quantification_merge) {

Behaviour is identical because operands are truthy/falsy, but it's inconsistent with the rest of the file (see workflows/rnaseq/main.nf:638 for an && example) and is a recurring "did you mean...?" question for reviewers.

Action: change all three lines to use &&. Pure correctness/style fix, no snapshot impact.

5. modules/local/deseq2_qc/main.nf — replace shell-substitution loop in stub

Lines 75-78 of the stub block use backtick command-substitution against the $counts input file:

for i in `head $counts -n 1 | cut -f3-`;
do
    touch size_factors/\${i}.size_factors.RData
done

Stubs should produce deterministic output without depending on input content. nf-core convention is fixed touch calls.

Action: replace the loop with one or two fixed touches, e.g. touch size_factors/sample1.size_factors.RData. If a richer set is needed for downstream stub testing, add a comment explaining why.

6. conf/modules/featurecounts.config:14 — drop duplicate withName block

The selector withName: 'CUSTOM_MULTIQCCUSTOMBIOTYPE' exists in two files:

  • conf/modules/multiqc_custom_biotype.config:2 (the natural home)
  • conf/modules/featurecounts.config:14 (duplicate)

Per inclusion order in nextflow.config, the multiqc_custom_biotype.config block wins, but the featurecounts.config block silently shadows what readers expect to be authoritative.

Action: delete lines 14-20 (the entire withName: 'CUSTOM_MULTIQCCUSTOMBIOTYPE' block) from conf/modules/featurecounts.config. Keep conf/modules/multiqc_custom_biotype.config as the single source.

7. conf/modules/align_star.config:6,140 — merge two adjacent withName blocks for the same selector

Two consecutive blocks use the character-identical selector '.*ALIGN_STAR:STAR_ALIGN|.*ALIGN_STAR:SENTIEON_STAR_ALIGN|.*ALIGN_STAR:PARABRICKS_RNA_FQ2BAM'. The first block sets ext.args, the second sets publishDir.

Action: merge the two blocks into one with both ext.args and publishDir. No behaviour change; pure readability.

8. Remove bin/fastq_dir_to_samplesheet.py

The script is not called by any .nf file in the pipeline. Git history confirms it is unmaintained:

  • First added 2021-06-17 (fb916c7f0)
  • Last functional change in 2023; the 2023-11 commit was a cross-repo "Add authors and licenses to scripts in bin/ where missing" sweep, not a content change
  • Zero commits in the last 2 years (since 2024-05)
  • The CHANGELOG already directs users to nf-core/fetchngs for samplesheet generation, which supersedes this script's purpose

Action:

  1. Delete bin/fastq_dir_to_samplesheet.py.
  2. Add a CHANGELOG entry under the next release: one sentence noting the removal and pointing users at nf-core/fetchngs (e.g. "Remove unmaintained bin/fastq_dir_to_samplesheet.py — use nf-core/fetchngs for samplesheet generation").
  3. If docs/usage.md references the script anywhere (grep first), strip those references and replace with a one-liner pointing to nf-core/fetchngs.

Verification

  • Run nf-core pipelines lint from the worktree — schema changes should pass.
  • Run nf-test test --profile=+test,docker --tag default to exercise the changed main.nf paths once. Expect zero snapshot diffs (none of these items change runtime).
  • Eyeball the generated MultiQC report from results/multiqc/ to confirm the README/schema text changes haven't broken any anchor links.

Acceptance criteria

  • Each of the 8 sub-items above is addressed in the same PR
  • nf-core pipelines lint passes
  • nf-test test --tag default passes with zero snapshot diffs
  • CHANGELOG entry added under the next release section, summarising the bundle in one or two sentences (do not list each sub-item individually — rationale belongs in the PR description)

Notes for the implementer

  • All eight items are independent. If any one of them turns out to be more involved than expected (e.g. doc rewrite for fastq_dir_to_samplesheet.py reveals broken behaviour), drop it from this PR and open a follow-up issue rather than expanding the scope.
  • Keep the diff scannable. If after sub-items 1-7 the diff is already large, drop sub-item 8 to a follow-up.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions