Fastq from unmapped reads by maxibor · Pull Request #189 · nf-core/eager

maxibor · 2019-04-09T12:43:45Z

To answer #187
Comes with a python script (using pysam) to recreate fastq files from the unmapped reads.

However this is using Python3 and the Python in the conda env is currently 2.7 :(

apeltzer · 2019-04-09T13:31:49Z

But the current code already does this?

Lines 846 to 898 in bc55df3

    
           process samtools_filter { 
        
               tag "$prefix" 
        
               publishDir "${params.outdir}/samtools/filter", mode: 'copy', 
        
               saveAs: {filename -> 
        
                       if (filename.indexOf(".fq.gz") > 0) "unmapped/$filename" 
        
                       else if (filename.indexOf(".unmapped.bam") > 0) "unmapped/$filename" 
        
                       else if (filename.indexOf(".filtered.bam")) filename 
        
                       else null 
        
               } 
        
               input:  
        
               file bam from ch_mapped_reads_filter.mix(ch_mapped_reads_filter_cm,ch_bwamem_mapped_reads_filter) 
        
               output: 
        
               file "*filtered.bam" into ch_bam_filtered_qualimap, ch_bam_filtered_dedup, ch_bam_filtered_markdup, ch_bam_filtered_pmdtools, ch_bam_filtered_angsd, ch_bam_filtered_gatk 
        
               file "*.fastq.gz" optional true 
        
               file "*.unmapped.bam" optional true 
        
               file "*.{bai,csi}" 
        
               script: 
        
               prefix="$bam" - ~/(\.bam)?/ 
        
               size = "${params.large_ref}" ? '-c' : '' 
        
               if("${params.bam_discard_unmapped}" && "${params.bam_unmapped_type}" == "discard"){ 
        
                   """ 
        
                   samtools view -h -b $bam -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -o ${prefix}.filtered.bam 
        
                   samtools index "${size}" ${prefix}.filtered.bam 
        
                   """ 
        
               } else if("${params.bam_discard_unmapped}" && "${params.bam_unmapped_type}" == "bam"){ 
        
                   """ 
        
                   samtools view -h $bam | tee >(samtools view - -@ ${task.cpus} -f4 -q ${params.bam_mapping_quality_threshold} -o ${prefix}.unmapped.bam) >(samtools view - -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -o ${prefix}.filtered.bam) 
        
                   samtools index "${size}" ${prefix}.filtered.bam 
        
                   """ 
        
               } else if("${params.bam_discard_unmapped}" && "${params.bam_unmapped_type}" == "fastq"){ 
        
                   """ 
        
                   samtools view -h $bam | tee >(samtools view - -@ ${task.cpus} -f4 -q ${params.bam_mapping_quality_threshold} -o ${prefix}.unmapped.bam) >(samtools view - -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -o ${prefix}.filtered.bam) 
        
                   samtools index "${size}" ${prefix}.filtered.bam 
        
                   samtools fastq -tn ${prefix}.unmapped.bam | pigz -p ${task.cpus} > ${prefix}.unmapped.fastq.gz 
        
                   rm ${prefix}.unmapped.bam 
        
                   """ 
        
               } else if("${params.bam_discard_unmapped}" && "${params.bam_unmapped_type}" == "both"){ 
        
                   """ 
        
                   samtools view -h $bam | tee >(samtools view - -@ ${task.cpus} -f4 -q ${params.bam_mapping_quality_threshold} -o ${prefix}.unmapped.bam) >(samtools view - -@ ${task.cpus} -F4 -q ${params.bam_mapping_quality_threshold} -o ${prefix}.filtered.bam) 
        
                   samtools index "${size}" ${prefix}.filtered.bam 
        
                   samtools fastq -tn ${prefix}.unmapped.bam | pigz -p ${task.cpus} > ${prefix}.unmapped.fastq.gz 
        
                   """ 
        
               } else { //Only apply quality filtering, default 
        
                   """ 
        
                   samtools view -h -b $bam -@ ${task.cpus} -q ${params.bam_mapping_quality_threshold} -o ${prefix}.filtered.bam 
        
                   samtools index "${size}" ${prefix}.filtered.bam 
        
                   """ 
        
               }   
        
           }

It even extracts the unmapped data to either BAM, FastQ depending on the users choice. I think we need something else :-(

jfy133 · 2019-04-09T14:04:28Z

@maxibor are these fastq reads pre Trimming and merging, but without human reads?

maxibor · 2019-04-09T14:05:47Z

@maxibor are these fastq reads pre Trimming and merging, but without human reads?

in this PR, yes

maxibor · 2019-04-09T14:08:42Z

It even extracts the unmapped data to either BAM, FastQ depending on the users choice. I think we need something else :-(

But this step works on post AR fastq files

jfy133 · 2019-04-09T14:15:07Z

Sorry - that close was courtesy of Maia

jfy133 · 2019-04-09T14:23:07Z

I think you are good to go for testing once python version fixed.

Minor thing: the help message/description is unspecific of what actually is being output. Thus maybe Alex' confusion

maxibor · 2019-04-09T15:08:23Z

Updated Conda env so Pysam should now come. But the test should fail because the Docker container isn't rebuilt as of now with the new Pysam dependancy

jfy133

(Official review)

Looking more in depth (please correct me if I misunderstand anything).

Major

I can't comment on the python section. However,

The BAM file as input into this tool should be immediately after BWA (sorted.bam) as it should include all possible mapped reads, not just ones that mapped exactly. i.e. if you use the reads post-samtools filter -q 37, the discarded mapped reads would not be filtered by the new module, because they are removed from the mapped BAM file after the samtools filter.
Is L921 flipped? Shouldn't it be if (params.singleEnd) { - as you only indicate a single output file in that conditional block?

Minor

For the process name (L906), flag itself (L363, L218) and help message (L108), I suggest the following extra precision instead of -unmap:

--strip_input_fastq                       Create pre-Adapter Removal FASTQ files without reads that mapped to reference (e.g. for public upload of privacy sensitive non-host data)

publishDir should renamed to something like /samtools/stripped_fastq rather than unmapped_fastq, as the latter file already exists with the --extract_unmapped functionality.

Equally, the output FASTQ names should be e.g. stripped.fwd.fq.gz and stripped.rev.fq.gz or something. I think R1/R2 would be dangerous for novices re-analyzing the data with the same pipeline, if they forget to add underscores (i.e. there are two R1/R2s in the name). This may then lead to funky input regex for a EAGER2 re-run and subsequent errors.

The actual names/flags can be discussed here - you don't have to go exactly with my 'stripped' suggestion.

maxibor · 2019-04-10T09:09:20Z

Followed the suggestions of @jfy133
Also user can now choose the strip mode: Either stripping completely the mapped reads (--strip_mode strip) or just replace the sequence of the mapped reads by N (--strip_mode replace)

jfy133 · 2019-04-10T09:16:16Z

@ivelsko owes you a beer ;)

maxibor · 2019-04-10T11:07:01Z

@apeltzer Can you have a look and merge if ok ?
The tests are working with the updated docker image using the new conda env. Travis should be fixed after the docker image is rebuilt.

apeltzer

Looking good 👍

apeltzer · 2019-04-10T12:09:44Z

Just failing because of missing python3.6 and pysam - merging to get the dev image updated 👍

Fix of #189

addind unmapped reads to fastq

1e08342

apeltzer added the WIP Work in progress label Apr 9, 2019

maxibor requested a review from apeltzer April 9, 2019 13:06

maxibor mentioned this pull request Apr 9, 2019

fastq metagenomics data file without human reads #187

Closed

jfy133 closed this Apr 9, 2019

jfy133 reopened this Apr 9, 2019

maxibor added 4 commits April 9, 2019 16:56

fix input type

42d6b69

update conda env for pysam dependancy

fff1bf7

add doctrings

ebebbe5

update travis for unmap reads output

572f308

changelog update

363c0ac

maxibor requested a review from jfy133 April 9, 2019 15:13

jfy133 requested changes Apr 10, 2019

View reviewed changes

maxibor added 5 commits April 10, 2019 10:02

merging upstream commits

6fd6611

add replace or strip mode and change strip name

318d02d

update travis to test for read stripping

5a8cabb

update changelog with strip name change

923a0a6

merge with upstram

b9d6173

jfy133 approved these changes Apr 10, 2019

View reviewed changes

maxibor mentioned this pull request Apr 10, 2019

Fix reads in start log to also display readPaths #193

Closed

doc update

8f17924

apeltzer approved these changes Apr 10, 2019

View reviewed changes

apeltzer merged commit 4d2f062 into nf-core:dev Apr 10, 2019

maxibor mentioned this pull request Apr 13, 2019

Fix of #189 #197

Merged

2 tasks

apeltzer added a commit that referenced this pull request Apr 13, 2019

Merge pull request #197 from maxibor/dev

7a21c1b

Fix of #189

Conversation

maxibor commented Apr 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apeltzer commented Apr 9, 2019

Uh oh!

jfy133 commented Apr 9, 2019

Uh oh!

maxibor commented Apr 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxibor commented Apr 9, 2019

Uh oh!

jfy133 commented Apr 9, 2019

Uh oh!

jfy133 commented Apr 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxibor commented Apr 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jfy133 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxibor commented Apr 10, 2019

Uh oh!

jfy133 commented Apr 10, 2019

Uh oh!

maxibor commented Apr 10, 2019

Uh oh!

apeltzer left a comment

Choose a reason for hiding this comment

Uh oh!

apeltzer commented Apr 10, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maxibor commented Apr 9, 2019 •

edited

Loading

maxibor commented Apr 9, 2019 •

edited

Loading

jfy133 commented Apr 9, 2019 •

edited

Loading

maxibor commented Apr 9, 2019 •

edited

Loading

jfy133 left a comment •

edited

Loading