Department of Biochemistry & Molecular Biology, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, Canada.
Department of Cardiology, Xiangya Hospital, Central South University, Changsha, China.
Mol Biol Evol. 2021 May 19;38(6):2660-2672. doi: 10.1093/molbev/msab037.
DNA sequencing technologies provide unprecedented opportunities to analyze within-host evolution of microorganism populations. Often, within-host populations are analyzed via pooled sequencing of the population, which contains multiple individuals or "haplotypes." However, current next-generation sequencing instruments, in conjunction with single-molecule barcoded linked-reads, cannot distinguish long haplotypes directly. Computational reconstruction of haplotypes from pooled sequencing has been attempted in virology, bacterial genomics, metagenomics, and human genetics, using algorithms based on either cross-host genetic sharing or within-host genomic reads. Here, we describe PoolHapX, a flexible computational approach that integrates information from both genetic sharing and genomic sequencing. We demonstrated that PoolHapX outperforms state-of-the-art tools tailored to specific organismal systems, and is robust to within-host evolution. Importantly, together with barcoded linked-reads, PoolHapX can infer whole-chromosome-scale haplotypes from 50 pools each containing 12 different haplotypes. By analyzing real data, we uncovered dynamic variations in the evolutionary processes of within-patient HIV populations previously unobserved in single position-based analysis.
DNA 测序技术为分析微生物种群在宿主内的进化提供了前所未有的机会。通常,通过对包含多个个体或“单倍型”的群体进行 pooled 测序来分析宿主内群体。然而,当前的下一代测序仪器与单分子条形码连接读取相结合,无法直接区分长单倍型。已经尝试在病毒学、细菌基因组学、宏基因组学和人类遗传学中使用基于跨宿主遗传共享或宿主内基因组读取的算法,从 pooled 测序中重建单倍型。在这里,我们描述了 PoolHapX,这是一种灵活的计算方法,它整合了遗传共享和基因组测序的信息。我们证明了 PoolHapX 优于针对特定生物体系统定制的最先进工具,并且对宿主内进化具有鲁棒性。重要的是,与条形码连接读取相结合,PoolHapX 可以从每个包含 12 个不同单倍型的 50 个池中推断出全染色体规模的单倍型。通过分析真实数据,我们揭示了以前在基于单个位置的分析中未观察到的 HIV 患者内群体进化过程中的动态变化。