-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathREADME
More file actions
29 lines (16 loc) · 1.41 KB
/
README
File metadata and controls
29 lines (16 loc) · 1.41 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Here is the pipeline used to generate this set of variants causing exon skipping:
# BEFORE CURATION
1. run query_hgmd_splice_MySQL.py to produce hgmd_splice_variants.tsv
2. pull out hgmd_splice_pmids (trivial, can use cut or some other command line utility)
3. run scrape_pubmed_abstracts.py which takes hgmd_splice_pmids.txt as input, produces exon_skipping_pmids.txt
4. run map_pmids_to_variants.py which takes exon_skipping_pmids.txt and hgmd_splice_variants.tsv as input, and produces exon_skipping.prelim.vcf
5. annotate preliminary VCF file with VEP (this is just to collect useful info for the manual curation step)
6. run tableize_vep_vcf.py on annotated to get data into spreadsheet format (exon_skipping.prelim.tsv)
7. finally, run create_spreadsheet.R to get exon_skipping_spreadsheet.tsv. This will serve as the medium for manual curation.
Note: merge_spreadsheets.R merges manual annotations from a partially curated spreadsheet with a newly generated spreadsheet
# AFTER CURATION
python convert_tsv_to_vcf.py -i curated_spreadsheet.tsv --info status,donorDist,acceptorDist,type | grep -v "status=discard" | grep -v "type=as" > donor_skip.vcf
python convert_tsv_to_vcf.py -i curated_spreadsheet.tsv --info status,donorDist,acceptorDist,type | grep -v "status=discard" | grep -v "type=ds" > acceptor_skip.vcf
# Other notes
convert_tsv_to_vcf.py is in maclab_scripts/vcf/
tableize_vep_vcf.py is in maclab_scripts/vep/