Skip to content

PaleovirologyLab/hi-fever

Repository files navigation

Pixi Badge Nextflow

HI-FEVER

High-throughput nextflow EVE recovery

hi-fever is a Nextflow workflow for finding endogenous viral elements (EVEs) in host genomes. It aims to address common issues in paleovirology including cross-matches between host proteins and EVEs, computational burden of EVE searches and incompatability between software packages or platforms. We provide HI-FEVER as an accessible and informative workflow for any EVE-discovery project.

Features

  • Protein-to-DNA based search allows detection of divergent and ancient EVEs
  • Designed to function with millions of input query proteins
  • Reconstructs the predicted EVE protein based on its closest modern match
  • Harnesses parallelisation to optimise compute resources
  • Scales from laptop to cluster
  • Conda and Docker compatible
  • LINUX, Windows and MAC compatible

HI-FEVER provides a variety of output information about candidate EVEs, suited to many downstream purposes. Outputs include:

  • Genomic coordinates of candidate EVEs
  • Closest matches in the reciprocal databases, including full taxonomical information
  • Predicted EVE protein sequences and cDNA (frameshift and premature STOP codon aware), with extension beyond original hit
  • Extracted nucleotide sequence of each candidate EVE and flanking host genome sequence
  • Metadata & statistics of the genome assemblies screened

Installation and usage

HI-FEVER is available for use on LINUX, Windows (WSL) and Mac through Conda and Docker. Full documentation can be found in the wiki.

Test run

To experiment with and explore HI-FEVER options we provide instructions on running a test dataset below. All data used for this test are available on our Open Science Framework repository here in the sample_run folder.

Preparation

Ensure the required files are in the hi-fever/data folder:

  • query_20perfamNoRetro.fasta protein query file
  • genome_human_ftp.txt link to the human genome ftp
  • taxdump.tar.gz taxonomy map file
  • MINI-nr_rep_seq-clustered_70id_80c_wtaxa.dmnd.tar.xz: the minimal database built from the NCBI non-redundant database
  • MINI_rvdbv28_wtaxa.dmnd.tar.xz: the minimal database built from the RVDB database

Unzip the reciprocal databases with the following tar commands:

tar -xf MINI-nr_rep_seq-clustered_70id_80c_wtaxa.dmnd.tar.xz
tar -xf MINI_rvdbv28_wtaxa.dmnd.tar.xz

If using conda, activate the environment. If using Docker on Mac (arm64), open a terminal tab within Docker desktop. If using Docker on LINUX add the -with_docker flag to the run command below.

Run the HI-FEVER workflow from the root hi-fever folder with the following command (replacing the email address):

nextflow main.nf --query_file_aa 20_per_fam_no_retro.fasta --ftp_file human_T2T_ftp.txt --email john.smith@email.com

This will generate a folder called output with two subfolders: accessory_fastas and sql. These outputs are detailed on our Usage page. For a guide on how to interpret these results, see our Interpreting results page

Acknowledgements

HI-FEVER is based on the following libraries and programs directory along with their license:

Citation

Please include the following citation when using HI-FEVER in your projects.

Laura Muñoz-Baena, Emma F Harding, Jose Gabriel Nino Barreat, Cormac M Kinsella, Aris Katzourakis, HI-FEVER: a Nextflow pipeline for the high-throughput discovery and annotation of endogenous viral elements, Bioinformatics, 2025;, btaf610, https://doi.org/10.1093/bioinformatics/btaf610

About

HI-FEVER: A Nextflow pipeline for the high-throughput discovery and annotation of endogenous viral elements

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors