Mouse Genome Support#52
Conversation
|
This just needs tests for mouse stuff now - otherwise, we need to make the reference data available via iGenomes now. |
Add changes by Maxime Co-Authored-By: Maxime Garcia <max.u.garcia@gmail.com>
maxulysse
left a comment
There was a problem hiding this comment.
Amazing job, thanks a lot for this PR.
Co-Authored-By: Maxime Garcia <max.u.garcia@gmail.com>
drpatelh
left a comment
There was a problem hiding this comment.
Looks good to me 👍
If you havent already it will be worth checking that the .fai files match with iGenomes in terms of chromosome size and ordering otherwise it may break things.
ftp://ftp-mouse.sanger.ac.uk/ref/GRCm38_68.fa.fai
| bwaIndex = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BWAIndex/genome.fa.{amb,ann,bwt,pac,sa}" | ||
| chrDir = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Chromosomes" | ||
| chrLength = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Length/GRCm38.len" | ||
| dbsnp = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/MouseGenomeProject/Annotation/mgp.v5.merged.snps_all.dbSNP142.vcf.gz" |
There was a problem hiding this comment.
Where did you download these files from @apeltzer ? Looks like they are the latest but may be worth documenting that somewhere e.g.
dbSNP files for GRCm38 were downloaded from ftp://ftp-mouse.sanger.ac.uk/REL-1505-SNPs_Indels/ on [date].
There was a problem hiding this comment.
These files are quite difficult to find so worth being a bit more transparent whilst things are fresh!
There was a problem hiding this comment.
I've added a PR to AWS-iGenomes where I'm adding these already:
Didn't know exactly where this info should be dropped, but that felt like a logical thing to do as others not using Sarek might use the files too and can find the information about where I got these files from there then, too ?
The .fai files and .dict match - double checked that and also ran local test data (~50GB) with these change to check that everything is fine 👍
This is starting to add support for Mouse Genomes data to Sarek.
PR checklist
nextflow run . -profile test,docker).nf-core lint .).docsis updatedCHANGELOG.mdis updatedREADME.mdis updatedI will work on this ASAP, test locally and then provide both testing data AND the required reference data for iGenomes wherever possible / necessary.