Karagiannis Konstantinos, Simonyan Vahan, Chumakov Konstantin, Mazumder Raja
Department of Biochemistry and Molecular Medicine, George Washington University Medical Center, Washington, DC 20037, USA.
Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, MD 20993, USA.
Nucleic Acids Res. 2017 Nov 2;45(19):10989-11003. doi: 10.1093/nar/gkx755.
Sequence heterogeneity is a common characteristic of RNA viruses that is often referred to as sub-populations or quasispecies. Traditional techniques used for assembly of short sequence reads produced by deep sequencing, such as de-novo assemblers, ignore the underlying diversity. Here, we introduce a novel algorithm that simultaneously assembles discrete sequences of multiple genomes present in populations. Using in silico data we were able to detect populations at as low as 0.1% frequency with complete global genome reconstruction and in a single sample detected 16 resolved sequences with no mismatches. We also applied the algorithm to high throughput sequencing data obtained for viruses present in sewage samples and successfully detected multiple sub-populations and recombination events in these diverse mixtures. High sensitivity of the algorithm also enables genomic analysis of heterogeneous pathogen genomes from patient samples and accurate detection of intra-host diversity, enabling not just basic research in personalized medicine but also accurate diagnostics and monitoring drug therapies, which are critical in clinical and regulatory decision-making process.
序列异质性是RNA病毒的一个共同特征,通常被称为亚群体或准种。用于组装深度测序产生的短序列读数的传统技术,如从头组装器,忽略了潜在的多样性。在这里,我们介绍了一种新颖的算法,该算法能同时组装群体中存在的多个基因组的离散序列。使用计算机模拟数据,我们能够在低至0.1%的频率下检测到群体,并实现完整的全球基因组重建,且在单个样本中检测到16个无错配的解析序列。我们还将该算法应用于污水样本中病毒的高通量测序数据,并成功检测到这些多样混合物中的多个亚群体和重组事件。该算法的高灵敏度还能够对患者样本中的异质病原体基因组进行基因组分析,并准确检测宿主内的多样性,这不仅有助于个性化医学的基础研究,还能实现准确的诊断和监测药物治疗,而这在临床和监管决策过程中至关重要。