Skip to content

Mouse Genome Support#52

Merged
maxulysse merged 36 commits intonf-core:devfrom
apeltzer:dev
Nov 4, 2019
Merged

Mouse Genome Support#52
maxulysse merged 36 commits intonf-core:devfrom
apeltzer:dev

Conversation

@apeltzer
Copy link
Copy Markdown
Member

@apeltzer apeltzer commented Oct 21, 2019

This is starting to add support for Mouse Genomes data to Sarek.

PR checklist

  • This comment contains a description of changes (with reason)
  • If you've fixed a bug or added code that should be tested, add tests!
  • If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repo
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Make sure your code lints (nf-core lint .).
  • Documentation in docs is updated
  • CHANGELOG.md is updated
  • README.md is updated

I will work on this ASAP, test locally and then provide both testing data AND the required reference data for iGenomes wherever possible / necessary.

Comment thread conf/igenomes.config Outdated
Comment thread conf/igenomes.config Outdated
Comment thread conf/igenomes.config Outdated
Comment thread conf/genomes.config
@apeltzer apeltzer marked this pull request as ready for review October 31, 2019 09:51
@apeltzer
Copy link
Copy Markdown
Member Author

This just needs tests for mouse stuff now - otherwise, we need to make the reference data available via iGenomes now.

@apeltzer apeltzer changed the title WIP: Mouse Genome Support Mouse Genome Support Oct 31, 2019
@maxulysse maxulysse requested a review from a team October 31, 2019 10:15
@maxulysse maxulysse added the enhancement New feature or request label Oct 31, 2019
Comment thread docs/reference.md Outdated
Comment thread docs/reference.md Outdated
Add changes by Maxime

Co-Authored-By: Maxime Garcia <max.u.garcia@gmail.com>
Copy link
Copy Markdown
Member

@maxulysse maxulysse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing job, thanks a lot for this PR.

@maxulysse maxulysse requested a review from a team October 31, 2019 12:05
Comment thread CHANGELOG.md Outdated
Comment thread CHANGELOG.md Outdated
Co-Authored-By: Maxime Garcia <max.u.garcia@gmail.com>
Copy link
Copy Markdown
Member

@drpatelh drpatelh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

If you havent already it will be worth checking that the .fai files match with iGenomes in terms of chromosome size and ordering otherwise it may break things.
ftp://ftp-mouse.sanger.ac.uk/ref/GRCm38_68.fa.fai

Comment thread conf/igenomes.config
bwaIndex = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BWAIndex/genome.fa.{amb,ann,bwt,pac,sa}"
chrDir = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Chromosomes"
chrLength = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Length/GRCm38.len"
dbsnp = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/MouseGenomeProject/Annotation/mgp.v5.merged.snps_all.dbSNP142.vcf.gz"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did you download these files from @apeltzer ? Looks like they are the latest but may be worth documenting that somewhere e.g.

dbSNP files for GRCm38 were downloaded from ftp://ftp-mouse.sanger.ac.uk/REL-1505-SNPs_Indels/ on [date].

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files are quite difficult to find so worth being a bit more transparent whilst things are fresh!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a PR to AWS-iGenomes where I'm adding these already:

ewels/AWS-iGenomes#7

Didn't know exactly where this info should be dropped, but that felt like a logical thing to do as others not using Sarek might use the files too and can find the information about where I got these files from there then, too ?

The .fai files and .dict match - double checked that and also ran local test data (~50GB) with these change to check that everything is fine 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants