HI-FEVER

High-throughput nextflow EVE recovery

hi-fever is a Nextflow workflow for finding endogenous viral elements (EVEs) in host genomes. It aims to address common issues in paleovirology including cross-matches between host proteins and EVEs, computational burden of EVE searches and incompatability between software packages or platforms. We provide HI-FEVER as an accessible and informative workflow for any EVE-discovery project.

Features

Protein-to-DNA based search allows detection of divergent and ancient EVEs
Designed to function with millions of input query proteins
Reconstructs the predicted EVE protein based on its closest modern match
Harnesses parallelisation to optimise compute resources
Scales from laptop to cluster
Conda and Docker compatible
LINUX, Windows and MAC compatible

HI-FEVER provides a variety of output information about candidate EVEs, suited to many downstream purposes. Outputs include:

Genomic coordinates of candidate EVEs
Closest matches in the reciprocal databases, including full taxonomical information
Predicted EVE protein sequences and cDNA (frameshift and premature STOP codon aware), with extension beyond original hit
Extracted nucleotide sequence of each candidate EVE and flanking host genome sequence
Metadata & statistics of the genome assemblies screened

Installation and usage

HI-FEVER is available for use on LINUX, Windows (WSL) and Mac through Conda and Docker. Full documentation can be found in the wiki.

Test run

To experiment with and explore HI-FEVER options we provide instructions on running a test dataset below. All data used for this test are available on our Open Science Framework repository here in the sample_run folder.

Preparation

Ensure the required files are in the hi-fever/data folder:

query_20perfamNoRetro.fasta protein query file
genome_human_ftp.txt link to the human genome ftp
taxdump.tar.gz taxonomy map file
MINI-nr_rep_seq-clustered_70id_80c_wtaxa.dmnd.tar.xz: the minimal database built from the NCBI non-redundant database
MINI_rvdbv28_wtaxa.dmnd.tar.xz: the minimal database built from the RVDB database

Unzip the reciprocal databases with the following tar commands:

tar -xf MINI-nr_rep_seq-clustered_70id_80c_wtaxa.dmnd.tar.xz
tar -xf MINI_rvdbv28_wtaxa.dmnd.tar.xz

If using conda, activate the environment. If using Docker on Mac (arm64), open a terminal tab within Docker desktop. If using Docker on LINUX add the -with_docker flag to the run command below.

Run the HI-FEVER workflow from the root hi-fever folder with the following command (replacing the email address):

nextflow main.nf --query_file_aa 20_per_fam_no_retro.fasta --ftp_file human_T2T_ftp.txt --email john.smith@email.com

This will generate a folder called output with two subfolders: accessory_fastas and sql. These outputs are detailed on our Usage page. For a guide on how to interpret these results, see our Interpreting results page

Acknowledgements

HI-FEVER is based on the following libraries and programs directory along with their license:

Biopython (https://biopython.org/)
Seqtk (https://github.com/lh3/seqtk)
DIAMOND (https://github.com/bbuchfink/diamond)
BBmap (https://github.com/BioInfoTools/BBMap)
BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi)
Entrez (https://www.ncbi.nlm.nih.gov/Web/Search/entrezfs.html)
MMSeqs2 (https://github.com/soedinglab/MMseqs2)
Nextflow (https://www.nextflow.io/)
Python3 (https://www.python.org/)
Wise2 (https://www.ebi.ac.uk/~birney/wise2/)
Seqkit (https://bioinf.shenwei.me/seqkit/)
Bedtools (https://bedtools.readthedocs.io/en/latest/index.html)

Citation

Please include the following citation when using HI-FEVER in your projects.

Laura Muñoz-Baena, Emma F Harding, Jose Gabriel Nino Barreat, Cormac M Kinsella, Aris Katzourakis, HI-FEVER: a Nextflow pipeline for the high-throughput discovery and annotation of endogenous viral elements, Bioinformatics, 2025;, btaf610, https://doi.org/10.1093/bioinformatics/btaf610

Name		Name	Last commit message	Last commit date
Latest commit History 444 Commits
bin		bin
conf		conf
data		data
docker		docker
modules		modules
subworkflows		subworkflows
workflows		workflows
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cleanup_local.sh		cleanup_local.sh
cluster-hi-fever.slurm		cluster-hi-fever.slurm
environment_no_builds.yml		environment_no_builds.yml
hi-fever-db-PostgreSQL-schema.sql		hi-fever-db-PostgreSQL-schema.sql
main.nf		main.nf
merge_tables.sql		merge_tables.sql
nextflow.config		nextflow.config
pixi.lock		pixi.lock
pixi.toml		pixi.toml
reciprocal_data_table.sql		reciprocal_data_table.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HI-FEVER

High-throughput nextflow EVE recovery

Features

Installation and usage

Test run

Acknowledgements

Citation

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HI-FEVER

High-throughput nextflow EVE recovery

Features

Installation and usage

Test run

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages