Skip to content

Nextflow should not stage files that have the same name #470

@marcelm

Description

@marcelm

I collect files from multiple subdirectories and work on them in a single process. Nextflow does not complain if two files have the same basename, which leads to silent data loss. It seems that when it stages them, the second symlink overwrites the first one in the working directory.

To reproduce, run mkdir subdir1 subdir2 && echo hello > subdir1/file && echo world > subdir2/file and then run this workflow:

c = Channel.from([
  [file('subdir1/file'), file('subdir2/file')]])

process p {
  publishDir '.'

  input: file(x) from c
  output: file('concatenated')

  "cat $x > concatenated"
}

The intention was to get an output file that contains hello\nworld\n. Instead, I get world\nworld\n.

To give a little bit of context: In the actual pipeline, the process works with multiple FASTQ files that come from the same individual but were sequenced in different runs. They are stored in different directories, but the file (base-)names are in the standard Illumina scheme <sample-name>_S<sample-index>_L<lane-index>_R1_001.fastq.gz. With the sample name being identical (since they come from same individual), a collision occurs when - by chance - the other run of that sample used the same sample index and the same lane.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions