Grimm Dominik G, Azencott Chloé-Agathe, Aicheler Fabian, Gieraths Udo, MacArthur Daniel G, Samocha Kaitlin E, Cooper David N, Stenson Peter D, Daly Mark J, Smoller Jordan W, Duncan Laramie E, Borgwardt Karsten M
Machine Learning and Computational Biology Research Group, Max Planck Institute for Intelligent Systems and Max Planck Institute for Developmental Biology, Tübingen, Germany; Zentrum für Bioinformatik, Eberhard Karls Universität Tübingen, Tübingen, Germany; Department for Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.
Hum Mutat. 2015 May;36(5):513-23. doi: 10.1002/humu.22768. Epub 2015 Mar 26.
Prioritizing missense variants for further experimental investigation is a key challenge in current sequencing studies for exploring complex and Mendelian diseases. A large number of in silico tools have been employed for the task of pathogenicity prediction, including PolyPhen-2, SIFT, FatHMM, MutationTaster-2, MutationAssessor, Combined Annotation Dependent Depletion, LRT, phyloP, and GERP++, as well as optimized methods of combining tool scores, such as Condel and Logit. Due to the wealth of these methods, an important practical question to answer is which of these tools generalize best, that is, correctly predict the pathogenic character of new variants. We here demonstrate in a study of 10 tools on five datasets that such a comparative evaluation of these tools is hindered by two types of circularity: they arise due to (1) the same variants or (2) different variants from the same protein occurring both in the datasets used for training and for evaluation of these tools, which may lead to overly optimistic results. We show that comparative evaluations of predictors that do not address these types of circularity may erroneously conclude that circularity confounded tools are most accurate among all tools, and may even outperform optimized combinations of tools.
在当前探索复杂疾病和孟德尔疾病的测序研究中,对错义变异进行优先级排序以便进一步实验研究是一项关键挑战。大量的计算机工具已被用于致病性预测任务,包括PolyPhen-2、SIFT、FatHMM、MutationTaster-2、MutationAssessor、联合注释依赖损耗、LRT、phyloP和GERP++,以及工具分数组合的优化方法,如Condel和Logit。由于这些方法众多,一个重要的实际问题是这些工具中哪一个具有最佳的通用性,即正确预测新变异的致病特征。我们在一项对五个数据集上的10种工具的研究中表明,对这些工具的这种比较评估受到两种循环性的阻碍:它们的出现是由于(1)相同的变异,或(2)来自同一蛋白质的不同变异同时出现在用于训练和评估这些工具的数据集中,这可能导致过于乐观的结果。我们表明,对未解决这些循环性类型的预测器的比较评估可能会错误地得出结论,即存在循环性混淆的工具在所有工具中最准确,甚至可能优于工具的优化组合。