Enable gzipped FastA input as reference genome#111
Conversation
|
Should add support following on #91 |
|
@jfy133 please review this, then you may request changes and/or merge it :-) |
|
Would like to stick to review / merge pattern from now on to keep things protected here :-) |
There was a problem hiding this comment.
Only potential issue I see is, if we are assuming one wants to have compressed FASTAs in the first place, when the --saveReference flag is used - we would want to re-compress the saved FASTA once the reference files not needed anymore in the pipeline.
This would then save disk space when that particular file is not being used - which I guess was my motivation for that feature request.
|
Hm, I don#t get what you mean with this:
Zipping the index doesn't make that much sense, as we'd have to uncompress everytime we use the index again before running something in a pipeline (which is too much overhead ...) . Or do you mean we should save the indexed reference genome as compressed zip archive as well? |
The latter. But this is still a rare case I imagine. Just accepting a gzipped reference the first time the reference is used would be a sufficient purpose of this functionality as implmented here (e.g. genomes downloaded from NCBI are gzipped). Maybe keep this commit as it is for the moment. If someone else requests a recompressed indexed FASTA we can consider that later. |
Adds support for gzipped FastA reference genome input.
PR checklist
nextflow run . -profile test,docker).nf-core lint .).docsis updatedCHANGELOG.mdis updated