Suppr超能文献

用于预测错义变异影响的工具评估受到两种循环性的阻碍。

The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity.

作者信息

Grimm Dominik G, Azencott Chloé-Agathe, Aicheler Fabian, Gieraths Udo, MacArthur Daniel G, Samocha Kaitlin E, Cooper David N, Stenson Peter D, Daly Mark J, Smoller Jordan W, Duncan Laramie E, Borgwardt Karsten M

机构信息

Machine Learning and Computational Biology Research Group, Max Planck Institute for Intelligent Systems and Max Planck Institute for Developmental Biology, Tübingen, Germany; Zentrum für Bioinformatik, Eberhard Karls Universität Tübingen, Tübingen, Germany; Department for Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.

出版信息

Hum Mutat. 2015 May;36(5):513-23. doi: 10.1002/humu.22768. Epub 2015 Mar 26.

Abstract

Prioritizing missense variants for further experimental investigation is a key challenge in current sequencing studies for exploring complex and Mendelian diseases. A large number of in silico tools have been employed for the task of pathogenicity prediction, including PolyPhen-2, SIFT, FatHMM, MutationTaster-2, MutationAssessor, Combined Annotation Dependent Depletion, LRT, phyloP, and GERP++, as well as optimized methods of combining tool scores, such as Condel and Logit. Due to the wealth of these methods, an important practical question to answer is which of these tools generalize best, that is, correctly predict the pathogenic character of new variants. We here demonstrate in a study of 10 tools on five datasets that such a comparative evaluation of these tools is hindered by two types of circularity: they arise due to (1) the same variants or (2) different variants from the same protein occurring both in the datasets used for training and for evaluation of these tools, which may lead to overly optimistic results. We show that comparative evaluations of predictors that do not address these types of circularity may erroneously conclude that circularity confounded tools are most accurate among all tools, and may even outperform optimized combinations of tools.

摘要

在当前探索复杂疾病和孟德尔疾病的测序研究中,对错义变异进行优先级排序以便进一步实验研究是一项关键挑战。大量的计算机工具已被用于致病性预测任务,包括PolyPhen-2、SIFT、FatHMM、MutationTaster-2、MutationAssessor、联合注释依赖损耗、LRT、phyloP和GERP++,以及工具分数组合的优化方法,如Condel和Logit。由于这些方法众多,一个重要的实际问题是这些工具中哪一个具有最佳的通用性,即正确预测新变异的致病特征。我们在一项对五个数据集上的10种工具的研究中表明,对这些工具的这种比较评估受到两种循环性的阻碍:它们的出现是由于(1)相同的变异,或(2)来自同一蛋白质的不同变异同时出现在用于训练和评估这些工具的数据集中,这可能导致过于乐观的结果。我们表明,对未解决这些循环性类型的预测器的比较评估可能会错误地得出结论,即存在循环性混淆的工具在所有工具中最准确,甚至可能优于工具的优化组合。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验