Cai Dehan, Sun Yanni
Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR, China.
Bioinformatics. 2022 Apr 12;38(8):2127-2134. doi: 10.1093/bioinformatics/btac089.
Most RNA viruses lack strict proofreading during replication. Coupled with a high replication rate, some RNA viruses can form a virus population containing a group of genetically related but different haplotypes. Characterizing the haplotype composition in a virus population is thus important to understand viruses' evolution. Many attempts have been made to reconstruct viral haplotypes using next-generation sequencing (NGS) reads. However, the short length of NGS reads cannot cover distant single-nucleotide variants, making it difficult to reconstruct complete or near-complete haplotypes. Given the fast developments of third-generation sequencing technologies, a new opportunity has arisen for reconstructing full-length haplotypes with long reads.
In this work, we developed a new tool, RVHaplo to reconstruct haplotypes for known viruses from long reads. We tested it rigorously on both simulated and real viral sequencing data and compared it against other popular haplotype reconstruction tools. The results demonstrated that RVHaplo outperforms the state-of-the-art tools for viral haplotype reconstruction from long reads. Especially, RVHaplo can reconstruct the rare (1% abundance) haplotypes that other tools usually missed.
The source code and the documentation of RVHaplo are available at https://github.com/dhcai21/RVHaplo.
Supplementary data are available at Bioinformatics online.
大多数RNA病毒在复制过程中缺乏严格的校对机制。再加上高复制率,一些RNA病毒能够形成一个包含一组遗传相关但不同单倍型的病毒群体。因此,表征病毒群体中的单倍型组成对于理解病毒的进化很重要。人们已经进行了许多尝试,使用下一代测序(NGS)读数来重建病毒单倍型。然而,NGS读数的短长度无法覆盖远距离的单核苷酸变体,使得重建完整或接近完整的单倍型变得困难。鉴于第三代测序技术的快速发展,利用长读数重建全长单倍型出现了新的机遇。
在这项工作中,我们开发了一种新工具RVHaplo,用于从长读数中为已知病毒重建单倍型。我们在模拟和真实病毒测序数据上对其进行了严格测试,并将其与其他流行的单倍型重建工具进行了比较。结果表明,RVHaplo在从长读数进行病毒单倍型重建方面优于现有工具。特别是,RVHaplo能够重建其他工具通常遗漏的罕见(丰度为1%)单倍型。
RVHaplo的源代码和文档可在https://github.com/dhcai21/RVHaplo获取。
补充数据可在《生物信息学》在线获取。