Skip to content

Enable gzipped FastA input as reference genome#111

Merged
apeltzer merged 6 commits intonf-core:devfrom
apeltzer:zip_fasta
Dec 17, 2018
Merged

Enable gzipped FastA input as reference genome#111
apeltzer merged 6 commits intonf-core:devfrom
apeltzer:zip_fasta

Conversation

@apeltzer
Copy link
Copy Markdown
Member

Adds support for gzipped FastA reference genome input.

PR checklist

  • This comment contains a description of changes (with reason)
  • If you've fixed a bug or added code that should be tested, add tests!
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Make sure your code lints (nf-core lint .).
  • Documentation in docs is updated
  • CHANGELOG.md is updated

@apeltzer
Copy link
Copy Markdown
Member Author

Should add support following on #91

@apeltzer apeltzer requested a review from jfy133 December 15, 2018 22:13
@apeltzer
Copy link
Copy Markdown
Member Author

@jfy133 please review this, then you may request changes and/or merge it :-)

@apeltzer
Copy link
Copy Markdown
Member Author

Would like to stick to review / merge pattern from now on to keep things protected here :-)

Copy link
Copy Markdown
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only potential issue I see is, if we are assuming one wants to have compressed FASTAs in the first place, when the --saveReference flag is used - we would want to re-compress the saved FASTA once the reference files not needed anymore in the pipeline.

This would then save disk space when that particular file is not being used - which I guess was my motivation for that feature request.

@apeltzer
Copy link
Copy Markdown
Member Author

Hm, I don#t get what you mean with this:

  • Input FastA.gz (solved by this already)
  • Index creation
  • Usage in the pipeline
  • ... ?

Zipping the index doesn't make that much sense, as we'd have to uncompress everytime we use the index again before running something in a pipeline (which is too much overhead ...) .

Or do you mean we should save the indexed reference genome as compressed zip archive as well?

@jfy133
Copy link
Copy Markdown
Member

jfy133 commented Dec 17, 2018

Hm, I don#t get what you mean with this:

  • Input FastA.gz (solved by this already)
  • Index creation
  • Usage in the pipeline
  • ... ?

Zipping the index doesn't make that much sense, as we'd have to uncompress everytime we use the index again before running something in a pipeline (which is too much overhead ...) .

Or do you mean we should save the indexed reference genome as compressed zip archive as well?

The latter. But this is still a rare case I imagine.

Just accepting a gzipped reference the first time the reference is used would be a sufficient purpose of this functionality as implmented here (e.g. genomes downloaded from NCBI are gzipped).

Maybe keep this commit as it is for the moment. If someone else requests a recompressed indexed FASTA we can consider that later.

@apeltzer apeltzer merged commit 2a7e70e into nf-core:dev Dec 17, 2018
@apeltzer apeltzer deleted the zip_fasta branch December 17, 2018 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants