Erasmus MC, Department of Viroscience, Rotterdam, the Netherlands.
Institute of Diagnostic Virology, Friedrich-Loeffler-Institute, Insel Riems, Germany.
PLoS One. 2020 Feb 20;15(2):e0229326. doi: 10.1371/journal.pone.0229326. eCollection 2020.
As high-throughput sequencing technologies are becoming more widely adopted for analysing pathogens in disease outbreaks there needs to be assurance that the different sequencing technologies and approaches to data analysis will yield reliable and comparable results. Conversely, understanding where agreement cannot be achieved provides insight into the limitations of these approaches and also allows efforts to be focused on areas of the process that need improvement. This manuscript describes the next-generation sequencing of three closely related viruses, each analysed using different sequencing strategies, sequencing instruments and data processing pipelines. In order to determine the comparability of consensus sequences and minority (sub-consensus) single nucleotide variant (mSNV) identification, the biological samples, the sequence data from 3 sequencing platforms and the *.bam quality-trimmed alignment files of raw data of 3 influenza A/H5N8 viruses were shared. This analysis demonstrated that variation in the final result could be attributed to all stages in the process, but the most critical were the well-known homopolymer errors introduced by 454 sequencing, and the alignment processes in the different data processing pipelines which affected the consistency of mSNV detection. However, homopolymer errors aside, there was generally a good agreement between consensus sequences that were obtained for all combinations of sequencing platforms and data processing pipelines. Nevertheless, minority variant analysis will need a different level of careful standardization and awareness about the possible limitations, as shown in this study.
随着高通量测序技术在疾病爆发时分析病原体的应用越来越广泛,需要确保不同的测序技术和数据分析方法将产生可靠和可比的结果。相反,了解无法达成一致的地方可以深入了解这些方法的局限性,并使人们能够集中精力改进流程中的某些领域。本文描述了三种密切相关的病毒的下一代测序,每种病毒都使用不同的测序策略、测序仪器和数据处理管道进行分析。为了确定共识序列和少数(亚共识)单核苷酸变异(mSNV)鉴定的可比性,共享了 3 种流感 A/H5N8 病毒的生物样本、来自 3 个测序平台的序列数据以及原始数据的 *.bam 质量修剪对齐文件。该分析表明,最终结果的差异可归因于该过程的所有阶段,但最关键的是 454 测序引入的众所周知的长聚核苷酸错误,以及不同数据处理管道中的对齐过程,这会影响 mSNV 检测的一致性。然而,除了长聚核苷酸错误之外,对于所有测序平台和数据处理管道组合获得的共识序列通常都有很好的一致性。然而,正如本研究所示,少数变异分析将需要不同程度的仔细标准化和对可能的局限性的认识。