Skip to content

raveancic/fromhaplomulti-VCF2FASTA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

From haploid multiVCF 2 FASTA

This repo contains a jupyter notebook usefull to convert an haploid multi-vcf to a multi-FASTA file. Ready to be imported in other software and perform other analyses. This script exploits many functions of the wonderful package scikit-allel and it has been created from the desire of having a tool able to perform this conversion for downstram phylogenetic analyses.

Especially for studies (uniparental researches) that include multi-VCF:

  • created not only from the merging of VCFs self-produced from FASTQ, BAM files but also from the merging of VCFs given / downloaded from public repository (for which is impossible obtain the raw data).
  • post-QC multi-VCF in which a manual curation of the SNPs has been performed and informative SNPs are included.

Other tools such as bcftools have the option to generate a consensus FASTA sequence starting from a VCF but they require a reference sequence and it makes the resulting FASTA much more longer than the set of informative SNPs that are required for a phylogenetic analysis. Selecting specific positions and or regions make the process cumbersome especially if all the information you want is only in the VCF you have just produced.

The script is commented and the workflow is in the notebook. Any suggestions, implementations are welcome!

Please cite Colombo, G.; Traverso, L. et la., Overview of the Americas’ First Peopling from a Patrilineal Perspective: New Evidence from the Southern Continent. Genes 2022 if you use this script!

About

conversion from VCF 2 FASTA file

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors