HIV-ONT-Datasets

Alejandro Gener (ARG)

Practice dataset for native viral RNA sequencing and assembly. These data provide a model (ground truth) for direct viral haplotyping for SNVs at least 6 kb apart. Boo quasispecies. This information was made available for the COVID-19 Biohackathon April 5-11 2020.

Background

The Tg26 HIV mouse has gagpol-deleted HIV integrated artificially (inserted) into the mouse germline (Dickie, et al. 1991). These insertion sites have not been characterized at depth. A recent PCR-free gDNA sequencing run (unpublished) supported 15 HIV copies across multiple insertion sites (Table 1). At the time, that study was limited by inherrent shortcomings of short read sequencing (paired-end 150, average insert sized 511). Variants in HIV copies further apart than 511 bases would not be able to be phased with this short-read data. ONT DNA error profiles approximate native RNA profiles, and improve as new basecalling models develop (Gener, 2019; Gener and Kimata, 2019).

Method

One Tg26 mouse was humanely euthenized according to IACUC-approved protocols. Genomic DNA from the bone marrow of one Tg26 male mouse was extracted with special consideration for maintaining longer gDNA. Longrange PCR was performed to create ~6 kb amplicons. DNA was sequence on one MinION (Oxford Nanopore Technologies, Oxford, UK), with SQK-LSK109 accroding to manufacturer's protocol. Live basecalling with MinKNOW verion 19.12.5 was done on High-accuracy mode. Read filtering was done in Galaxy, specifying reads between 6 and 6.25 kb. Read mapping was done with minimap2 in Galaxy. Visualized in IGV. HIV-1 reference used: HIV-1 vector pNL4-3, complete sequence AF324493.2. All analyses as "base 1." No human patient data is deposited in this page.

Results

Table 1: Variants in a Tg26 mouse

There are 49 single nucleotide variants across the expected HIV portion of the transgene, supported by deep short read sequencing. Three are adjoining (and thus linked), so there were 46 variants across 15 HIV copies in this individual mouse.

Figure 1: Subset of reads that were acquired by the end of the long amplicon sequencing run The deletion seen toward left side is expected in this model. A few reads mapping at 3' LTR are likely 5' LTR reads with sequencing errors in key places of distinction between the two LTRs.

Sources:

Dickie P; Felser J; Eckhaus M; Bryant J; Silver J; Marinos N; Notkins AL. 1991. HIV-associated nephropathy in transgenic mice expressing HIV-1 genes. Virology 185(1):109-19PubMed: 1926769MGI: J:90297.

Gener, Alejandro R. 2019. “Full-Coverage Sequencing of HIV-1 Provirus from a Reference Plasmid.” BioRxiv, January, 611848. https://doi.org/10.1101/611848.

Gener, Alejandro R, and Jason T Kimata. 2019. “Full-Coverage Native RNA Sequencing of HIV-1 Viruses.” BioRxiv, January, 845610. https://doi.org/10.1101/845610.

Acknowledgements/Funding

Data was shared with permision. This work was funded in part by institutional support from Baylor College of Medicine; private funding by East Coast Oils, Inc., Jacksonville, Florida, and ARG’s own private funding, including Student Genomics (manuscripts in prep). ARG also received the PFLAG of Jacksonville scholarship for multiple years.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
HIV-ONT-Datasets		HIV-ONT-Datasets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HIV-ONT-Datasets

Background

Method

Results

Sources:

Acknowledgements/Funding

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

HIV-ONT-Datasets

Background

Method

Results

Sources:

Acknowledgements/Funding

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages