concat gz files#338
Conversation
Hi all, I'm passing by thanks to @apeltzer https://twitter.com/alex_peltzer/status/1222317924486127616 :-) I cannot test the workflow but, for this line: `gzcat *.gz | gzip > out.gz` you don't need to gunzip and re-gzip the files. A simple `cat` works. See https://stackoverflow.com/questions/8005114
|
Thanks @lindenb ! Very good point! The current master is very 'old' now, and there is 'rebuilt' version in dev which is much more mature and should be being released in a couple of months. That said, I see that zcat is still used there. I will close this PR, and feel free to make the changes to the Otherwise I'll add an issue and will make the swtich to |
|
Hi again, sorry if it's not the best place for asking a question about this workflow. For the line: https://github.com/nf-core/eager/blob/master/main.nf#L647 you're concatenating all the *.gz file and treat all the files as a single end file. Wouldn't it be a better strategy to keep the unmerged paired files apart from the single ends and map them as real paired end data ? |
|
No problem. In ancient DNA we typically have such short reads almost everything gets merged (>90%). The number of unmerged paired reads is so low it's just easier to lump them into one (so basically the same command as SE mapping), as we get very little extra info from having them unmerged and mapping separately. We could consider having an option to map them separately if there are more use cases though. @apeltzer can advise more on the reasoning behind this decision. |
|
@jfy133 thanks ! |
|
@lindenb there is also an EAGER channel on the nf-core slack if you prefer https://nf-co.re/join/slack |
Hi all, I'm passing by thanks to @apeltzer https://twitter.com/alex_peltzer/status/1222317924486127616 :-)
I cannot test the workflow for now (PR checklist...) but, for the line:
gzcat *.gz | gzip > out.gzyou don't need to gunzip and re-gzip the files. A simple
catworks and will be faster. See https://stackoverflow.com/questions/8005114Many thanks to contributing to nf-core/eager!
Please fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things requested on pull requests (PRs).
PR checklist
nextflow run . -profile test,docker).nf-core lint .).docsis updatedCHANGELOG.mdis updatedREADME.mdis updatedLearn more about contributing: https://github.com/nf-core/eager/tree/master/.github/CONTRIBUTING.md