Welkers Matthijs R A, Jonges Marcel, Jeeninga Rienk E, Koopmans Marion P G, de Jong Menno D
Department of Medical Microbiology, Academic Medical Centre Amsterdam, Netherlands.
Centre for Infectious Disease Control, National Institute for Public Health and the Environment Bilthoven, Netherlands ; Department of Viroscience, Erasmus Medical Center Rotterdam, Netherlands.
Front Microbiol. 2015 Jan 22;5:804. doi: 10.3389/fmicb.2014.00804. eCollection 2014.
High-throughput sequencing (HTS) of viral samples provides important information on the presence of viral minority variants. However, detection and accurate quantification is limited by the capacity to distinguish biological from artificial variation. In this study, errors related to the Illumina HiSeq2000 library generation and HTS process were investigated by determining minority variant frequencies in an influenza A/WSN/1933(H1N1) virus reverse-genetics plasmid pool. Errors related to amplification and sequencing were determined using the same plasmid pool, by generation of infectious virus using reverse genetics followed by in duplo reverse-transcriptase PCR (RT-PCR) amplification and HTS in the same sequence run. Results showed that after "best practice" quality control (QC), within the plasmid pool, one minority variant with a frequency >0.5% was identified, while 84 and 139 were identified in the RT-PCR amplified samples, indicating RT-PCR amplification artificially increased variation. Detailed analysis showed that artifactual minority variants could be identified by two major technical characteristics: their predominant presence in a single read orientation and uneven distribution of mismatches over the length of the reads. We demonstrate that by addition of two QC steps 95% of the artifactual minority variants could be identified. When our analysis approach was applied to three clinical samples 68% of the initially identified minority variants were identified as artifacts. Our study clearly demonstrated that, without additional QC steps, overestimation of viral minority variants is very likely to occur, mainly as a consequence of the required RT-PCR amplification step. The improved ability to detect and correct for artifactual minority variants, increases data resolution and could aid both past and future studies incorporating HTS. The source code has been made available through Sourceforge (https://sourceforge.net/projects/mva-ngs).
病毒样本的高通量测序(HTS)可提供有关病毒少数变异体存在情况的重要信息。然而,检测和准确量化受到区分生物学变异与人为变异能力的限制。在本研究中,通过测定甲型流感病毒A/WSN/1933(H1N1)病毒反向遗传学质粒库中的少数变异体频率,研究了与Illumina HiSeq2000文库构建和HTS过程相关的误差。使用相同的质粒库,通过反向遗传学产生感染性病毒,随后进行双份逆转录酶PCR(RT-PCR)扩增并在同一次测序运行中进行HTS,确定与扩增和测序相关的误差。结果显示,经过“最佳实践”质量控制(QC)后,在质粒库中鉴定出一个频率>0.5%的少数变异体,而在RT-PCR扩增样本中鉴定出84个和139个,表明RT-PCR扩增人为增加了变异。详细分析表明,人为少数变异体可通过两个主要技术特征来识别:它们主要存在于单一读取方向,以及错配在读取长度上分布不均。我们证明,通过添加两个QC步骤,可以识别95%的人为少数变异体。当我们的分析方法应用于三个临床样本时,最初鉴定出的少数变异体中有68%被鉴定为假象。我们的研究清楚地表明,如果没有额外的QC步骤,很可能会高估病毒少数变异体,这主要是所需RT-PCR扩增步骤的结果。检测和校正人为少数变异体的能力提高,增加了数据分辨率,并有助于过去和未来纳入HTS的研究。源代码已通过Sourceforge(https://sourceforge.net/projects/mva-ngs)提供。