Lee Hojun, Lee Ki-Wook, Lee Taeseob, Park Donghyun, Chung Jongsuk, Lee Chung, Park Woong-Yang, Son Dae-Soon
1Samsung Genome Institute (SGI), Samsung Medical Center (SMC), Seoul, 06351 South Korea.
2Department of Digital Health, SAIHST, Sungkyunkwan University, Seoul, 06351 South Korea.
Genes Genomics. 2018;40(2):189-197. doi: 10.1007/s13258-017-0621-9. Epub 2017 Nov 9.
In addition to the rapid advancement in Next-Generation Sequencing (NGS) technology, clinical panel sequencing is being used increasingly in clinical studies and tests. However, tools that are used in NGS data analysis have not been comparatively evaluated in performance for panel sequencing. This study aimed to evaluate the tools used in the alignment process, the first procedure in bioinformatics analysis, by comparing tools that have been widely used with ones that have been introduced recently. With the accumulated panel sequencing data, detected variant lists were cataloged and inserted into simulated reads produced from the reference genome (h19). The amount of unmapped reads and misaligned reads, mapping quality distribution, and runtime were measured as standards for comparison. As the most widely used tools, Bowtie2 and BWA-MEM each showed explicit performance with AUC of 0.9984 and 0.9970 respectively. Kart, maintaining superior runtime and less number of misaligned read, also similarly possessed high level of AUC (0.9723). Such selection and optimization method of tools appropriate for panel sequencing can be utilized for fields requiring error minimization, such as clinical application and liquid biopsy studies.
除了新一代测序(NGS)技术的快速发展外,临床panel测序在临床研究和检测中的应用也越来越广泛。然而,用于NGS数据分析的工具在panel测序性能方面尚未得到比较评估。本研究旨在通过比较广泛使用的工具和最近引入的工具,评估生物信息学分析的第一步——比对过程中使用的工具。利用积累的panel测序数据,将检测到的变异列表编目并插入从参考基因组(h19)产生的模拟 reads 中。将未比对 reads 和比对错误 reads 的数量、比对质量分布和运行时间作为比较标准进行测量。作为使用最广泛的工具,Bowtie2和BWA-MEM分别以0.9984和0.9970的AUC显示出明显的性能。Kart在保持出色运行时间和较少比对错误 reads 数量的同时,也同样具有较高水平的AUC(0.9723)。这种适用于panel测序的工具选择和优化方法可用于需要将错误最小化的领域,如临床应用和液体活检研究。