Beerenwinkel Niko, Günthard Huldrych F, Roth Volker, Metzner Karin J
Department of Biosystems Science and Engineering, ETH Zurich Basel, Switzerland.
Front Microbiol. 2012 Sep 11;3:329. doi: 10.3389/fmicb.2012.00329. eCollection 2012.
Many viruses, including the clinically relevant RNA viruses HIV (human immunodeficiency virus) and HCV (hepatitis C virus), exist in large populations and display high genetic heterogeneity within and between infected hosts. Assessing intra-patient viral genetic diversity is essential for understanding the evolutionary dynamics of viruses, for designing effective vaccines, and for the success of antiviral therapy. Next-generation sequencing (NGS) technologies allow the rapid and cost-effective acquisition of thousands to millions of short DNA sequences from a single sample. However, this approach entails several challenges in experimental design and computational data analysis. Here, we review the entire process of inferring viral diversity from sample collection to computing measures of genetic diversity. We discuss sample preparation, including reverse transcription and amplification, and the effect of experimental conditions on diversity estimates due to in vitro base substitutions, insertions, deletions, and recombination. The use of different NGS platforms and their sequencing error profiles are compared in the context of various applications of diversity estimation, ranging from the detection of single nucleotide variants (SNVs) to the reconstruction of whole-genome haplotypes. We describe the statistical and computational challenges arising from these technical artifacts, and we review existing approaches, including available software, for their solution. Finally, we discuss open problems, and highlight successful biomedical applications and potential future clinical use of NGS to estimate viral diversity.
许多病毒,包括具有临床相关性的RNA病毒HIV(人类免疫缺陷病毒)和HCV(丙型肝炎病毒),以大量群体形式存在,并且在受感染宿主内部和之间表现出高度的遗传异质性。评估患者体内病毒的遗传多样性对于理解病毒的进化动态、设计有效的疫苗以及抗病毒治疗的成功至关重要。新一代测序(NGS)技术能够从单个样本中快速且经济高效地获取数千到数百万条短DNA序列。然而,这种方法在实验设计和计算数据分析方面带来了若干挑战。在这里,我们回顾了从样本采集到计算遗传多样性指标这一推断病毒多样性的整个过程。我们讨论了样本制备,包括逆转录和扩增,以及由于体外碱基替换、插入、缺失和重组导致的实验条件对多样性估计的影响。在从单核苷酸变异(SNV)检测到全基因组单倍型重建等各种多样性估计应用的背景下,比较了不同NGS平台的使用及其测序错误概况。我们描述了由这些技术假象引发的统计和计算挑战,并回顾了现有的解决方法,包括可用软件。最后,我们讨论了尚未解决的问题,并强调了NGS在估计病毒多样性方面成功的生物医学应用以及潜在的未来临床用途。