allow integer sample names by idot · Pull Request #1421 · nf-core/rnaseq

idot · 2024-10-18T11:40:43Z

int was not allowed as sample name anymore (since 3.16.0)
Validation of file failed:
-> Entry 1: Error for field 'sample' (298098): Sample name must be provided and cannot contain spaces

PR checklist

[*] This comment contains a description of changes (with reason).
CHANGELOG.md is updated.

github-actions · 2024-10-18T11:40:54Z

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.0.2.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

idot · 2024-10-18T11:58:29Z

fixes #1419

pinin4fjords · 2025-01-16T15:51:24Z

@idot can you confirm that you've tested the workflow with this change? Also, please update the CHANGELOG.

idot · 2025-03-03T14:21:41Z

I have updated the changelog. There was a discussion on slack and the developers wanted a more comprehensive solution however in 3.18.0 the error is still there. I have tested also 3.18 with this change.

MatthiasZepper

Mind that R does not allow purely numeric column names. If you try assigning one, it will be automatically prepended with X:

> example <- data.frame("123345"=LETTERS)
> head(example)
  X123345
1       A
2       B
3       C
4       D
5       E
6       F

So you will anyway end up with non-numeric sample names in your quantification and I would want to perform a very careful review of all R scripts, whether some data merging steps fail, e.g. the scaling/normalization just defaults to 1 for each sample etc.

I fear there might be more subtle issues that do not show instantly by a crashing pipeline run.

pinin4fjords · 2025-03-12T15:18:21Z

Mind that R does not allow purely numeric column names. If you try assigning one, it will be automatically prepended with X:
> example <- data.frame("123345"=LETTERS)
> head(example)
  X123345
1       A
2       B
3       C
4       D
5       E
6       F
So you will anyway end up with non-numeric sample names in your quantification and I would want to perform a very careful review of all R scripts, whether some data merging steps fail, e.g. the scaling/normalization just defaults to 1 for each sample etc.

I fear there might be more subtle issues that do not show instantly by a crashing pipeline run.

I did at some point go through and put check.names = FALSE in various places to avoid this.

@idot said that they had test this, so fingers crossed that was effective.

idot · 2025-03-13T12:20:23Z

Yes, in the R part the sample names get an X prepended

pinin4fjords · 2025-03-13T16:28:03Z

Yes, in the R part the sample names get an X prepended

OK, then we need to do some work to address that before this is merged.

Validation of file failed: -> Entry 1: Error for field 'sample' (298098): Sample name must be provided and cannot contain spaces

This reverts commit 7d3daa4.

Schema accepts ["string", "integer"] for the sample column; meta.id is coerced to String after samplesheetToList so numeric IDs propagate as strings through channel keys, file names, and R column headers. Closes nf-core#1419

Sample IDs (298098, 298504, 317960, 319093) propagate as strings through file names, merged gene-count column headers, and DESeq2 PCA output - no R X-prefixing because every callsite already passes check.names = FALSE.

…tiQC modes Add three more cases to tests/integer_samplenames.nf.test: - full --skip_quantification_merge run: validates the per-sample MultiQC code path (one report per integer-named sample, each with its own table_sample_merge lookbehind); - --aligner hisat2 stub: validates the non-STAR alignment branch; - pseudo-only kallisto stub: validates the --skip_alignment route. Verified per-sample MultiQC output: each sample's report carries its integer ID verbatim, with Read 1 / Read 2 rows for PE samples and a single row for the SE sample.

- Drop the integer-id-specific nf-test + fixture: other sample-naming niceties aren't tested here either. - Strip the verbose coercion comment. - Move the CHANGELOG entry to its numeric slot.

pinin4fjords · 2026-05-11T13:25:44Z

Pushed an update via maintainer-edit. Why integer sample IDs are safe now, in two steps:

1. Inside Nextflow channels: types stay consistent. With type: ["string", "integer"], nf-schema can hand us an Integer in meta.id. The multiqc_rnaseq subworkflow joins channels keyed by meta.id (typed) against per-sample TSVs whose IDs come from filename parsing (always String). Integer-keyed vs String-keyed .join silently drops samples in Groovy. So this push adds a meta.id as String coercion right after samplesheetToList in workflows/rnaseq/main.nf, before any downstream channel work.

2. Inside R aggregators: no X prefix. Audited the three places that build sample-column tables, all already pass check.names = FALSE:

bin/deseq2_qc.r:58
modules/nf-core/summarizedexperiment/.../summarizedexperiment.r:16-33
modules/nf-core/tximeta/tximport/templates/tximport.r:46, 101, 112, 125, 212

bin/dupradar.r and the Python scripts only use meta.id as a filename prefix, so no column-name path. So @MatthiasZepper's concern doesn't materialise in this pipeline.

Empirical check. Ran the pipeline against a samplesheet with the IDs from #1419 (298098, 298504, 317960, 319093). Five configurations: default full+stub, --skip_quantification_merge full, --aligner hisat2 stub, pseudo-only kallisto stub. All pass; integer IDs verbatim in every output:

$ head -2 salmon.merged.gene_counts.tsv
gene_id	gene_name	298098	298504	317960	319093
Gfp_transgene_gene	Gfp_transgene_gene	0	0	0	0

$ head -5 deseq2.pca.vals.txt
"sample"	"PC1: 52% variance"	"PC2: 27% variance"
"298098"	-1.30898138940436	0.439079416717309
"298504"	-0.872656356507467	-0.654613144978441
"319093"	0.969970988497281	1.06605337716581
"317960"	1.21166675741454	-0.850519648904681

$ cut -f1 multiqc_general_stats.txt | head -7
Sample
298098
298098 Read 1
298098 Read 2
298504
298504 Read 1
298504 Read 2

Per-sample MultiQC under --skip_quantification_merge produced four reports at output/<sample>/multiqc/star_salmon/<sample>_multiqc_report.html, each carrying its own integer ID.

Also in this push: reverted the unrelated umi dedup fix for mqc commit (can ship separately if still wanted), merged in current origin/dev (branch was ~1300 commits behind), and slotted the CHANGELOG entry into its numeric position. No regression test for integer IDs, mirroring the rest of the suite which doesn't carry per-naming-quirk cases.

pinin4fjords · 2026-05-11T13:35:18Z

I'm going to merge this now, I think it should be fine. We'll deal with any final corners if and when they pop up for for now I see no need to block integer IDs.

(Edit: once I figure out why there's a snapshot mismatch)

I've tested this, I think we've headed off the R issues.

idot changed the base branch from master to dev October 18, 2024 11:45

idot mentioned this pull request Oct 18, 2024

3.16.x does not allow numeric sample ids #1419

Open

pinin4fjords approved these changes Mar 3, 2025

View reviewed changes

MatthiasZepper previously requested changes Mar 11, 2025

View reviewed changes

idot force-pushed the allow_int_samplenames branch from aee4a2f to 48b6d12 Compare June 12, 2025 08:56

allow integer sample names

e1c52e1

Validation of file failed: -> Entry 1: Error for field 'sample' (298098): Sample name must be provided and cannot contain spaces

idot force-pushed the allow_int_samplenames branch from 48b6d12 to e1c52e1 Compare June 12, 2025 08:58

idot and others added 7 commits June 17, 2025 16:26

umi dedup fix for mqc

7d3daa4

Revert "umi dedup fix for mqc"

8b938da

This reverts commit 7d3daa4.

Merge remote-tracking branch 'origin/dev' into pr-1421-push

b5b2b89

feat(input): allow integer sample names in samplesheet

afff325

Schema accepts ["string", "integer"] for the sample column; meta.id is coerced to String after samplesheetToList so numeric IDs propagate as strings through channel keys, file names, and R column headers. Closes nf-core#1419

test(input): snapshot the integer-sample-name pipeline run

0be4218

Sample IDs (298098, 298504, 317960, 319093) propagate as strings through file names, merged gene-count column headers, and DESeq2 PCA output - no R X-prefixing because every callsite already passes check.names = FALSE.

refactor: trim test infra, comment, and changelog placement

4e43a71

- Drop the integer-id-specific nf-test + fixture: other sample-naming niceties aren't tested here either. - Strip the verbose coercion comment. - Move the CHANGELOG entry to its numeric slot.

pinin4fjords requested a review from MatthiasZepper May 11, 2026 13:32

pinin4fjords enabled auto-merge May 11, 2026 13:35

pinin4fjords merged commit 9db8dbf into nf-core:dev May 11, 2026
123 of 125 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow integer sample names#1421

allow integer sample names#1421
pinin4fjords merged 8 commits into
nf-core:devfrom
idot:allow_int_samplenames

idot commented Oct 18, 2024

Uh oh!

github-actions Bot commented Oct 18, 2024 •

edited by nf-core-bot

Loading

Uh oh!

idot commented Oct 18, 2024

Uh oh!

pinin4fjords commented Jan 16, 2025 •

edited

Loading

Uh oh!

idot commented Mar 3, 2025 •

edited

Loading

Uh oh!

MatthiasZepper left a comment •

edited

Loading

Uh oh!

pinin4fjords commented Mar 12, 2025

Uh oh!

idot commented Mar 13, 2025

Uh oh!

pinin4fjords commented Mar 13, 2025

Uh oh!

pinin4fjords commented May 11, 2026 •

edited

Loading

Uh oh!

pinin4fjords commented May 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

idot commented Oct 18, 2024

PR checklist

Uh oh!

github-actions Bot commented Oct 18, 2024 • edited by nf-core-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

idot commented Oct 18, 2024

Uh oh!

pinin4fjords commented Jan 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

idot commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MatthiasZepper left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pinin4fjords commented Mar 12, 2025

Uh oh!

idot commented Mar 13, 2025

Uh oh!

pinin4fjords commented Mar 13, 2025

Uh oh!

pinin4fjords commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pinin4fjords commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Oct 18, 2024 •

edited by nf-core-bot

Loading

pinin4fjords commented Jan 16, 2025 •

edited

Loading

idot commented Mar 3, 2025 •

edited

Loading

MatthiasZepper left a comment •

edited

Loading

pinin4fjords commented May 11, 2026 •

edited

Loading

pinin4fjords commented May 11, 2026 •

edited

Loading