Freire Borja, Ladra Susana, Parama Jose R, Salmela Leena
IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):1550-1562. doi: 10.1109/TCBB.2022.3190282. Epub 2023 Apr 3.
During viral infection, intrahost mutation and recombination can lead to significant evolution, resulting in a population of viruses that harbor multiple haplotypes. The task of reconstructing these haplotypes from short-read sequencing data is called viral quasispecies assembly, and it can be categorized as a multiassembly problem. We consider the de novo version of the problem, where no reference is available. We present ViQUF, a de novo viral quasispecies assembler that addresses haplotype assembly and quantification. ViQUF obtains a first draft of the assembly graph from a de Bruijn graph. Then, solving a min-cost flow over a flow network built for each pair of adjacent vertices based on their paired-end information creates an approximate paired assembly graph with suggested frequency values as edge labels, which is the first frequency estimation. Then, original haplotypes are obtained through a greedy path reconstruction guided by a min-cost flow solution in the approximate paired assembly graph. ViQUF outputs the contigs with their frequency estimations. Results on real and simulated data show that ViQUF is at least four times faster using at most half of the memory than previous methods, while maintaining, and in some cases outperforming, the high quality of assembly and frequency estimation of overlap graph-based methodologies, which are known to be more accurate but slower than the de Bruijn graph-based approaches.
在病毒感染期间,宿主内的突变和重组可导致显著的进化,从而产生一个包含多种单倍型的病毒群体。从短读长测序数据中重建这些单倍型的任务称为病毒准种组装,它可归类为一个多重组装问题。我们考虑该问题的从头版本,即没有可用参考序列的情况。我们提出了ViQUF,一种用于解决单倍型组装和定量的从头病毒准种组装器。ViQUF从德布鲁因图获得组装图的初稿。然后,基于每对相邻顶点的双端信息构建流网络,通过求解该流网络上的最小成本流,创建一个带有建议频率值作为边标签的近似双端组装图,这是第一次频率估计。然后,在近似双端组装图中通过最小成本流解决方案引导的贪婪路径重建来获得原始单倍型。ViQUF输出带有频率估计的重叠群。真实数据和模拟数据的结果表明,ViQUF使用的内存最多只有之前方法的一半,但速度至少快四倍,同时保持了基于重叠图方法的高质量组装和频率估计,在某些情况下甚至优于这些方法,已知基于重叠图的方法更准确,但比基于德布鲁因图的方法慢。