Kısakol Batuhan, Sarıhan Şahin, Ergün Mehmet Arif, Baysan Mehmet
Department of Physiology and Medical Physics, Centre for Systems Medicine, Royal College of Surgeons in Ireland, Dublin Ireland.
Computer Engineering Department, Faculty of Engineering, Marmara University, İstanbul, Turkey Turkey.
Turk J Biol. 2021 Apr 20;45(2):114-126. doi: 10.3906/biy-2008-8. eCollection 2021.
The importance of next generation sequencing (NGS) rises in cancer research as accessing this key technology becomes easier for researchers. The sequence data created by NGS technologies must be processed by various bioinformatics algorithms within a pipeline in order to convert raw data to meaningful information. Mapping and variant calling are the two main steps of these analysis pipelines, and many algorithms are available for these steps. Therefore, detailed benchmarking of these algorithms in different scenarios is crucial for the efficient utilization of sequencing technologies. In this study, we compared the performance of twelve pipelines (three mapping and four variant discovery algorithms) with recommended settings to capture single nucleotide variants. We observed significant discrepancy in variant calls among tested pipelines for different heterogeneity levels in real and simulated samples with overall high specificity and low sensitivity. Additional to the individual evaluation of pipelines, we also constructed and tested the performance of pipeline combinations. In these analyses, we observed that certain pipelines complement each other much better than others and display superior performance than individual pipelines. This suggests that adhering to a single pipeline is not optimal for cancer sequencing analysis and sample heterogeneity should be considered in algorithm optimization.
随着研究人员更容易获得下一代测序(NGS)这项关键技术,它在癌症研究中的重要性日益凸显。由NGS技术生成的序列数据必须在一个流程中通过各种生物信息学算法进行处理,以便将原始数据转化为有意义的信息。比对和变异检测是这些分析流程的两个主要步骤,并且有许多算法可用于这些步骤。因此,在不同场景下对这些算法进行详细的基准测试对于测序技术的高效利用至关重要。在本研究中,我们将十二个流程(三种比对算法和四种变异发现算法)在推荐设置下的性能进行了比较,以捕获单核苷酸变异。我们观察到,在真实和模拟样本中,对于不同的异质性水平,测试流程之间的变异检测结果存在显著差异,总体上特异性高而敏感性低。除了对各个流程进行单独评估外,我们还构建并测试了流程组合的性能。在这些分析中,我们观察到某些流程之间的互补性比其他流程更好,并且表现出比单个流程更优的性能。这表明,在癌症测序分析中坚持使用单一流程并非最佳选择,并且在算法优化中应考虑样本异质性。