Kim Bo-Young, Park Jung Hoon, Jo Hye-Yeong, Koo Soo Kyung, Park Mi-Hyun
Division of Intractable Diseases, Center for Biomedical Sciences, Korea National Institute of Health, Chungcheongbuk-do, South Korea.
Macrogen Inc., Gasan-dong, Seoul, South Korea.
PLoS One. 2017 Aug 9;12(8):e0182272. doi: 10.1371/journal.pone.0182272. eCollection 2017.
Insertion and deletion (INDEL) mutations, the most common type of structural variance, are associated with several human diseases. The detection of INDELs through next-generation sequencing (NGS) is becoming more common due to the decrease in costs, the increase in efficiency, and sensitivity improvements demonstrated by the various sequencing platforms and analytical tools. However, there are still many errors associated with INDEL variant calling, and distinguishing INDELs from errors in NGS remains challenging. To evaluate INDEL calling from whole-exome sequencing (WES) data, we performed Sanger sequencing for all INDELs called from the several calling algorithm. We compared the performance of the four algorithms (i.e. GATK, SAMtools, Dindel, and Freebayes) for INDEL detection from the same sample. We examined the sensitivity and PPV of GATK (90.2 and 89.5%, respectively), SAMtools (75.3 and 94.4%, respectively), Dindel (90.1 and 88.6%, respectively), and Freebayes (80.1 and 94.4%, respectively). GATK had the highest sensitivity. Furthermore, we identified INDELs with high PPV (4 algorithms intersection: 98.7%, 3 algorithms intersection: 97.6%, and GATK and SAMtools intersection INDELs: 97.6%). We presented two key sources of difficulties in accurate INDEL detection: 1) the presence of repeat, and 2) heterozygous INDELs. Herein we could suggest the accessible algorithms that selectively reduce error rates and thereby facilitate INDEL detection. Our study may also serve as a basis for understanding the accuracy and completeness of INDEL detection.
插入和缺失(INDEL)突变是最常见的结构变异类型,与多种人类疾病相关。由于成本降低、效率提高以及各种测序平台和分析工具所展示的灵敏度提升,通过下一代测序(NGS)检测INDEL变得越来越普遍。然而,INDEL变异调用仍存在许多错误,在NGS中区分INDEL与错误仍然具有挑战性。为了评估从全外显子组测序(WES)数据中进行的INDEL调用,我们对从几种调用算法中调用的所有INDEL进行了桑格测序。我们比较了四种算法(即GATK、SAMtools、Dindel和Freebayes)从同一样本中检测INDEL的性能。我们检查了GATK(分别为90.2%和89.5%)、SAMtools(分别为75.3%和94.4%)、Dindel(分别为90.1%和88.6%)和Freebayes(分别为80.1%和94.4%)的灵敏度和阳性预测值(PPV)。GATK具有最高的灵敏度。此外,我们鉴定出了具有高PPV的INDEL(四种算法交集:98.7%,三种算法交集:97.6%,以及GATK和SAMtools交集INDEL:97.6%)。我们提出了准确检测INDEL的两个关键困难来源:1)重复序列的存在,以及2)杂合INDEL。在此我们可以推荐可选择性降低错误率从而促进INDEL检测的可行算法。我们的研究也可为理解INDEL检测的准确性和完整性提供基础。