Protein Structure and Bioinformatics, Department of Experimental Medical Science, Lund University, Lund, Sweden.
PLoS Comput Biol. 2019 Feb 11;15(2):e1006481. doi: 10.1371/journal.pcbi.1006481. eCollection 2019 Feb.
Computational tools are widely used for interpreting variants detected in sequencing projects. The choice of these tools is critical for reliable variant impact interpretation for precision medicine and should be based on systematic performance assessment. The performance of the methods varies widely in different performance assessments, for example due to the contents and sizes of test datasets. To address this issue, we obtained 63,160 common amino acid substitutions (allele frequency ≥1% and <25%) from the Exome Aggregation Consortium (ExAC) database, which contains variants from 60,706 genomes or exomes. We evaluated the specificity, the capability to detect benign variants, for 10 variant interpretation tools. In addition to overall specificity of the tools, we tested their performance for variants in six geographical populations. PON-P2 had the best performance (95.5%) followed by FATHMM (86.4%) and VEST (83.5%). While these tools had excellent performance, the poorest method predicted more than one third of the benign variants to be disease-causing. The results allow choosing reliable methods for benign variant interpretation, for both research and clinical purposes, as well as provide a benchmark for method developers.
计算工具被广泛用于解释测序项目中检测到的变异。这些工具的选择对于精准医学中可靠的变异影响解释至关重要,应该基于系统的性能评估。由于测试数据集的内容和大小不同,这些方法的性能在不同的性能评估中差异很大。为了解决这个问题,我们从外显子组聚合联盟(ExAC)数据库中获得了 63160 个常见的氨基酸替换(等位基因频率≥1%且<25%),其中包含了 60706 个基因组或外显子的变异。我们评估了 10 种变异解释工具的特异性,即识别良性变异的能力。除了工具的总体特异性外,我们还针对六个地理群体的变异测试了它们的性能。PON-P2 的性能最好(95.5%),其次是 FATHMM(86.4%)和 VEST(83.5%)。虽然这些工具的性能优异,但最差的方法预测有超过三分之一的良性变异是致病的。这些结果为研究和临床目的提供了可靠的良性变异解释方法的选择,同时也为方法开发者提供了一个基准。