Skip to content
This repository was archived by the owner on Jan 27, 2020. It is now read-only.

Commit 53ad6f9

Browse files
committed
Better GRCH38 Documentation for input files
1 parent 359bb3f commit 53ad6f9

1 file changed

Lines changed: 42 additions & 22 deletions

File tree

docs/REFERENCES.md

Lines changed: 42 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -17,20 +17,57 @@ The following files need to be downloaded:
1717
- dd05833f18c22cc501e3e31406d140b0 - 'human\_g1k\_v37\_decoy.fasta.gz'
1818
- a0764a80311aee369375c5c7dda7e266 - 'Mills\_and\_1000G\_gold\_standard.indels.b37.vcf.gz'
1919

20-
### Other files
20+
### Other files for GRCh37
2121

2222
From our repo, get the [`intervals` list file](https://raw.githubusercontent.com/SciLifeLab/Sarek/master/repeats/wgs_calling_regions.grch37.list). More information about this file in the [intervals documentation](INTERVALS.md)
2323

2424
Description of how to generate the Loci file used in the ASCAT process is described [here](https://github.com/SciLifeLab/Sarek/blob/master/docs/ASCAT.md).
2525

26-
You can create your own cosmic reference for any human reference as specified below.
26+
You can create your own cosmic reference for any human reference as specified below in the Cosmic section.
2727

28-
### COSMIC files
28+
## GRCh38
29+
30+
Use `--genome GRCh38` to map against GRCh38. Before doing so and if you are not on UPPMAX, you need to adjust the settings in `genomes.config` to your needs.
31+
32+
To get the needed files, download the GATK bundle for GRCh38 from [ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/](ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/). You can also download the required files from the Google Cloud mirror link [here](https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0).
33+
34+
The MD5SUM of `Homo_sapiens_assembly38.fasta` included in that file is 7ff134953dcca8c8997453bbb80b6b5e.
35+
36+
If you download the data from the FTP servers `beta/` directory, which seems to be an older version of the bundle, only `Homo_sapiens_assembly38.known_indels.vcf` is needed. Also, you can omit `dbsnp_138_` and `dbsnp_144` files as we use `dbsnp_146`. The old ones also use the wrong chromosome naming convention. The Google Cloud mirror has all data in the `v0` directory, but requires you to remove the `resources_broad_hg38_v0_` prefixes from all files.
37+
38+
The following files need to be downloaded:
39+
40+
- 3884c62eb0e53fa92459ed9bff133ae6 - 'Homo_sapiens_assembly38.dict'
41+
- 7ff134953dcca8c8997453bbb80b6b5e - 'Homo_sapiens_assembly38.fasta'
42+
- b07e65aa4425bc365141756f5c98328c - 'Homo_sapiens_assembly38.fasta.64.alt'
43+
- e4dc4fdb7358198e0847106599520aa9 - 'Homo_sapiens_assembly38.fasta.64.amb'
44+
- af611ed0bb9487fb1ba4aa1a7e7ad21c - 'Homo_sapiens_assembly38.fasta.64.ann'
45+
- d41d8cd98f00b204e9800998ecf8427e - 'Homo_sapiens_assembly38.fasta.64.bwt'
46+
- 178862a79b043a2f974ef10e3877ef86 - 'Homo_sapiens_assembly38.fasta.64.pac'
47+
- 91a5d5ed3986db8a74782e5f4519eb5f - 'Homo_sapiens_assembly38.fasta.64.sa'
48+
- f76371b113734a56cde236bc0372de0a - 'Homo_sapiens_assembly38.fasta.fai'
49+
- 14cc588a271951ac1806f9be895fb51f - 'Homo_sapiens_assembly38.known_indels.vcf.gz'
50+
- 1a55fdfa6533ae5cbc70e8188e779229 - 'Homo_sapiens_assembly38.known_indels.vcf.gz.tbi'
51+
- 2e02696032dcfe95ff0324f4a13508e3 - 'Mills_and_1000G_gold_standard.indels.hg38.vcf.gz'
52+
- 4c807e2cbe0752c0c44ac82ff3b52025 - 'Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi'
53+
54+
If you just downloaded the `Homo_sapiens_assembly38.fasta.gz` file, you would need to do:
55+
56+
```
57+
gunzip Homo_sapiens_assembly38.fasta.gz
58+
bwa index -6 Homo_sapiens_assembly38.fasta
59+
```
60+
61+
Description of how to generate the Loci file used in the ASCAT process is described [here](https://github.com/SciLifeLab/Sarek/blob/master/docs/ASCAT.md).
2962

30-
To annotate with COSMIC variants during MuTect1/2 Variant Calling you need to create a compatible VCF file.
63+
You can create your own cosmic reference for any human reference as specified below in the Cosmic section.
64+
65+
## COSMIC files
66+
67+
To annotate with COSMIC variants during MuTect1/2 Variant Calling you need to create a compatible VCF file.
3168
Download the coding and non-coding VCF files from [COSMIC](http://cancer.sanger.ac.uk/cosmic/download) and
3269
process them with the [Create\_Cosmic.sh](https://github.com/SciLifeLab/Sarek/tree/master/scripts/Create_Cosmic.sh)
33-
script. The script requires a fasta index `.fai`, of the reference file you are using.
70+
script for either GRCh37 or GRCh38. The script requires a fasta index `.fai`, of the reference file you are using.
3471

3572
Example:
3673

@@ -47,23 +84,6 @@ To index the resulting VCF file use [igvtools](https://software.broadinstitute.o
4784
igvtools index <cosmicvxx.vcf>
4885
```
4986

50-
## GRCh38
51-
52-
Use `--genome GRCh38` to map against GRCh38. Before doing so and if you are not on UPPMAX, you need to adjust the settings in `genomes.config` to your needs.
53-
54-
To get the needed files, download the GATK bundle for GRCh38 from [ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/](ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/).
55-
56-
The MD5SUM of `Homo_sapiens_assembly38.fasta` included in that file is 7ff134953dcca8c8997453bbb80b6b5e.
57-
58-
From the `beta/` directory, which seems to be an older version of the bundle, only `Homo_sapiens_assembly38.known_indels.vcf` is needed. Also, you can omit `dbsnp_138_` and `dbsnp_144` files as we use `dbsnp_146`. The old ones also use the wrong chromosome naming convention.
59-
60-
Afterwards, the following needs to be done:
61-
62-
```
63-
gunzip Homo_sapiens_assembly38.fasta.gz
64-
bwa index -6 Homo_sapiens_assembly38.fasta
65-
```
66-
6787
## smallGRCh37
6888

6989
Use `--genome smallGRCh37` to map against a small reference genome based on GRCh37. `smallGRCh37` is the default genome for the testing profile (`-profile testing`).

0 commit comments

Comments
 (0)