Song Zhi, Cai Dehan, Sun Yanni, Wang Lusheng
Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR (HKG), China.
Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR (HKG), China.
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf063.
Viral genome analysis is crucial for understanding virus evolution and mutation. Investigations into viral evolutionary dynamics and mutation patterns have garnered significant research attention since the outbreak of COVID-19. The basic structure of many virus genomes is highly conserved [1]. RNA viruses have high mutation rates, and single-nucleotide variations may induce substantial phenotypic alterations in terms of viral function and pathogenicity. Thus, special assembly methods are required for viral genome analysis.
PVGA starts with a reference genome and the sequencing reads. The first step in PVGA involves constructing an alignment graph based on a reference genome and the set of input sequencing reads. Then the optimal genomic path is determined through dynamic programming, maximizing the cumulative edge weights that reflect read support density across the alignment graph. The obtained path corresponds to a refined genome. Finally, we repeat the process by using the new reference genomes until no further improvement is possible. We evaluate PVGA's performance across both assembly and polishing tasks using simulated and real datasets, including both long reads and short reads. The experiments demonstrate that PVGA always outperforms popular existing programs in terms of the quality of assembly results, while the running time of our method is compatible to others. In particular, simulated Nanopore datasets show that our method can correctly report the true genomes with 0 mismatches and 0 indels.
PVGA is a novel viral genome assembler that seamlessly integrates assembly and polishing into a unified workflow. Its design prioritizes high accuracy, enabling the detection of subtle genomic variations that can impact viral function and pathogenicity. By addressing the unique challenges of viral genome assembly, PVGA provides a reliable and precise solution for advancing our understanding of viral evolution and behavior.
病毒基因组分析对于理解病毒进化和突变至关重要。自新冠疫情爆发以来,对病毒进化动态和突变模式的研究受到了广泛关注。许多病毒基因组的基本结构高度保守[1]。RNA病毒具有较高的突变率,单核苷酸变异可能会在病毒功能和致病性方面引起显著的表型改变。因此,病毒基因组分析需要特殊的组装方法。
PVGA以参考基因组和测序读段作为起始。PVGA的第一步是基于参考基因组和输入的测序读段集构建一个比对图。然后通过动态规划确定最优基因组路径,使反映比对图上读段支持密度的累积边权重最大化。得到的路径对应一个优化后的基因组。最后,我们使用新的参考基因组重复这个过程,直到无法进一步改进为止。我们使用模拟和真实数据集,包括长读段和短读段,评估PVGA在组装和优化任务中的性能。实验表明PVGA在组装结果质量方面总是优于现有的流行程序,而我们方法的运行时间与其他方法相当。特别是,模拟的纳米孔数据集表明我们的方法能够正确报告无错配和无插入缺失的真实基因组。
PVGA是一种新型的病毒基因组组装工具,它将组装和优化无缝集成到一个统一的工作流程中。其设计优先考虑高精度,能够检测出可能影响病毒功能和致病性的细微基因组变异。通过应对病毒基因组组装的独特挑战,PVGA为推进我们对病毒进化和行为的理解提供了一个可靠且精确的解决方案。