用于预测错义变异影响的工具评估受到两种循环性的阻碍。

The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity.

作者信息

Grimm Dominik G, Azencott Chloé-Agathe, Aicheler Fabian, Gieraths Udo, MacArthur Daniel G, Samocha Kaitlin E, Cooper David N, Stenson Peter D, Daly Mark J, Smoller Jordan W, Duncan Laramie E, Borgwardt Karsten M

机构信息

Machine Learning and Computational Biology Research Group, Max Planck Institute for Intelligent Systems and Max Planck Institute for Developmental Biology, Tübingen, Germany; Zentrum für Bioinformatik, Eberhard Karls Universität Tübingen, Tübingen, Germany; Department for Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.

出版信息

Hum Mutat. 2015 May;36(5):513-23. doi: 10.1002/humu.22768. Epub 2015 Mar 26.

DOI:10.1002/humu.22768

PMID:25684150

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4409520/

Abstract

Prioritizing missense variants for further experimental investigation is a key challenge in current sequencing studies for exploring complex and Mendelian diseases. A large number of in silico tools have been employed for the task of pathogenicity prediction, including PolyPhen-2, SIFT, FatHMM, MutationTaster-2, MutationAssessor, Combined Annotation Dependent Depletion, LRT, phyloP, and GERP++, as well as optimized methods of combining tool scores, such as Condel and Logit. Due to the wealth of these methods, an important practical question to answer is which of these tools generalize best, that is, correctly predict the pathogenic character of new variants. We here demonstrate in a study of 10 tools on five datasets that such a comparative evaluation of these tools is hindered by two types of circularity: they arise due to (1) the same variants or (2) different variants from the same protein occurring both in the datasets used for training and for evaluation of these tools, which may lead to overly optimistic results. We show that comparative evaluations of predictors that do not address these types of circularity may erroneously conclude that circularity confounded tools are most accurate among all tools, and may even outperform optimized combinations of tools.

摘要

在当前探索复杂疾病和孟德尔疾病的测序研究中，对错义变异进行优先级排序以便进一步实验研究是一项关键挑战。大量的计算机工具已被用于致病性预测任务，包括PolyPhen-2、SIFT、FatHMM、MutationTaster-2、MutationAssessor、联合注释依赖损耗、LRT、phyloP和GERP++，以及工具分数组合的优化方法，如Condel和Logit。由于这些方法众多，一个重要的实际问题是这些工具中哪一个具有最佳的通用性，即正确预测新变异的致病特征。我们在一项对五个数据集上的10种工具的研究中表明，对这些工具的这种比较评估受到两种循环性的阻碍：它们的出现是由于（1）相同的变异，或（2）来自同一蛋白质的不同变异同时出现在用于训练和评估这些工具的数据集中，这可能导致过于乐观的结果。我们表明，对未解决这些循环性类型的预测器的比较评估可能会错误地得出结论，即存在循环性混淆的工具在所有工具中最准确，甚至可能优于工具的优化组合。

相似文献

The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity.

Hum Mutat. 2015 May;36(5):513-23. doi: 10.1002/humu.22768. Epub 2015 Mar 26.

REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants.

Am J Hum Genet. 2016 Oct 6;99(4):877-885. doi: 10.1016/j.ajhg.2016.08.016. Epub 2016 Sep 22.

Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies.

Hum Mol Genet. 2015 Apr 15;24(8):2125-37. doi: 10.1093/hmg/ddu733. Epub 2014 Dec 30.

The CYSMA web server: An example of integrative tool for in silico analysis of missense variants identified in Mendelian disorders.

Hum Mutat. 2020 Feb;41(2):375-386. doi: 10.1002/humu.23941. Epub 2019 Nov 15.

Comparison of Predictive Tools on Missense Variants in , , and Genes Associated with Autosomal Recessive Deafness 1A (DFNB1A).

ScientificWorldJournal. 2019 Mar 20;2019:5198931. doi: 10.1155/2019/5198931. eCollection 2019.

[Evaluation of performance of five bioinformatics software for the prediction of missense mutations].

Zhonghua Yi Xue Yi Chuan Xue Za Zhi. 2016 Oct;33(5):625-8. doi: 10.3760/cma.j.issn.1003-9406.2016.05.009.

Using SIFT and PolyPhen to predict loss-of-function and gain-of-function mutations.

Genet Test Mol Biomarkers. 2010 Aug;14(4):533-7. doi: 10.1089/gtmb.2010.0036.

Performance of in silico prediction tools for the classification of rare BRCA1/2 missense variants in clinical diagnostics.

BMC Med Genomics. 2018 Mar 27;11(1):35. doi: 10.1186/s12920-018-0353-y.

PredictSNP: robust and accurate consensus classifier for prediction of disease-related mutations.

PLoS Comput Biol. 2014 Jan;10(1):e1003440. doi: 10.1371/journal.pcbi.1003440. Epub 2014 Jan 16.

Assessment of 13 in silico pathogenicity methods on cancer-related variants.

Comput Biol Med. 2022 Jun;145:105434. doi: 10.1016/j.compbiomed.2022.105434. Epub 2022 Mar 26.

引用本文的文献

Language Modelling Techniques for Analysing the Impact of Human Genetic Variation.

Bioinform Biol Insights. 2025 Sep 2;19:11779322251358314. doi: 10.1177/11779322251358314. eCollection 2025.

Creating an atlas of variant effects to resolve variants of uncertain significance and guide cardiovascular medicine.

Nat Rev Cardiol. 2025 Sep 1. doi: 10.1038/s41569-025-01201-7.

Assessing variant effect predictors and disease mechanisms in intrinsically disordered proteins.

PLoS Comput Biol. 2025 Aug 19;21(8):e1013400. doi: 10.1371/journal.pcbi.1013400. eCollection 2025 Aug.

Assessing the performance of 28 pathogenicity prediction methods on rare single nucleotide variants in coding regions.

BMC Genomics. 2025 Jul 7;26(1):641. doi: 10.1186/s12864-025-11787-4.

PRP: pathogenic risk prediction for rare nonsynonymous single nucleotide variants.

Hum Genet. 2025 May 29. doi: 10.1007/s00439-025-02751-z.

Missense variants pathogenicity annotation from homologous proteins.

Bioinformatics. 2025 May 6;41(5). doi: 10.1093/bioinformatics/btaf305.

A classification-occupancy model based on automatically identified species data.

Ecology. 2025 May;106(5):e70086. doi: 10.1002/ecy.70086.

Evaluating variant pathogenicity prediction tools to establish African inclusive guidelines for germline genetic testing.

Commun Med (Lond). 2025 May 6;5(1):157. doi: 10.1038/s43856-025-00883-x.

Accurate identification and mechanistic evaluation of pathogenic missense variants with .

Proc Natl Acad Sci U S A. 2025 May 6;122(18):e2418100122. doi: 10.1073/pnas.2418100122. Epub 2025 May 2.

Variant effect predictor correlation with functional assays is reflective of clinical classification performance.

Genome Biol. 2025 Apr 22;26(1):104. doi: 10.1186/s13059-025-03575-w.

本文引用的文献

Next-generation sequencing-based molecular diagnosis of 82 retinitis pigmentosa probands from Northern Ireland.

Hum Genet. 2015 Feb;134(2):217-30. doi: 10.1007/s00439-014-1512-7. Epub 2014 Dec 4.

Hotspot activating PRKD1 somatic mutations in polymorphous low-grade adenocarcinomas of the salivary glands.

Nat Genet. 2014 Nov;46(11):1166-9. doi: 10.1038/ng.3096. Epub 2014 Sep 21.

Mutations in SLC13A5 cause autosomal-recessive epileptic encephalopathy with seizure onset in the first days of life.

Am J Hum Genet. 2014 Jul 3;95(1):113-20. doi: 10.1016/j.ajhg.2014.06.006.

MutationTaster2: mutation prediction for the deep-sequencing age.

Nat Methods. 2014 Apr;11(4):361-2. doi: 10.1038/nmeth.2890.

A general framework for estimating the relative pathogenicity of human genetic variants.

Nat Genet. 2014 Mar;46(3):310-5. doi: 10.1038/ng.2892. Epub 2014 Feb 2.

A polygenic burden of rare disruptive mutations in schizophrenia.

Nature. 2014 Feb 13;506(7487):185-90. doi: 10.1038/nature12975. Epub 2014 Jan 22.

PredictSNP: robust and accurate consensus classifier for prediction of disease-related mutations.

PLoS Comput Biol. 2014 Jan;10(1):e1003440. doi: 10.1371/journal.pcbi.1003440. Epub 2014 Jan 16.

Ensembl 2014.

Nucleic Acids Res. 2014 Jan;42(Database issue):D749-55. doi: 10.1093/nar/gkt1196. Epub 2013 Dec 6.

Systematic identification of molecular subtype-selective vulnerabilities in non-small-cell lung cancer.

Cell. 2013 Oct 24;155(3):552-66. doi: 10.1016/j.cell.2013.09.041.

The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine.

Hum Genet. 2014 Jan;133(1):1-9. doi: 10.1007/s00439-013-1358-4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于预测错义变异影响的工具评估受到两种循环性的阻碍。

The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献