PON-P2：快速可靠识别有害变异的预测方法

PON-P2: prediction method for fast and reliable identification of harmful variants.

作者信息

Niroula Abhishek, Urolagin Siddhaling, Vihinen Mauno

机构信息

Department of Experimental Medical Science, Lund University, Lund, Sweden.

出版信息

PLoS One. 2015 Feb 3;10(2):e0117380. doi: 10.1371/journal.pone.0117380. eCollection 2015.

DOI:10.1371/journal.pone.0117380

PMID:25647319

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4315405/

Abstract

More reliable and faster prediction methods are needed to interpret enormous amounts of data generated by sequencing and genome projects. We have developed a new computational tool, PON-P2, for classification of amino acid substitutions in human proteins. The method is a machine learning-based classifier and groups the variants into pathogenic, neutral and unknown classes, on the basis of random forest probability score. PON-P2 is trained using pathogenic and neutral variants obtained from VariBench, a database for benchmark variation datasets. PON-P2 utilizes information about evolutionary conservation of sequences, physical and biochemical properties of amino acids, GO annotations and if available, functional annotations of variation sites. Extensive feature selection was performed to identify 8 informative features among altogether 622 features. PON-P2 consistently showed superior performance in comparison to existing state-of-the-art tools. In 10-fold cross-validation test, its accuracy and MCC are 0.90 and 0.80, respectively, and in the independent test, they are 0.86 and 0.71, respectively. The coverage of PON-P2 is 61.7% in the 10-fold cross-validation and 62.1% in the test dataset. PON-P2 is a powerful tool for screening harmful variants and for ranking and prioritizing experimental characterization. It is very fast making it capable of analyzing large variant datasets. PON-P2 is freely available at http://structure.bmc.lu.se/PON-P2/.

摘要

需要更可靠、更快速的预测方法来解读测序和基因组计划产生的海量数据。我们开发了一种新的计算工具PON-P2，用于对人类蛋白质中的氨基酸替换进行分类。该方法是一种基于机器学习的分类器，根据随机森林概率得分将变异分为致病、中性和未知类别。PON-P2使用从VariBench（一个基准变异数据集数据库）获得的致病和中性变异进行训练。PON-P2利用有关序列进化保守性、氨基酸的物理和生化特性、GO注释以及（如果可用）变异位点的功能注释的信息。进行了广泛的特征选择，以在总共622个特征中识别出8个信息特征。与现有的最先进工具相比，PON-P2始终表现出卓越的性能。在10折交叉验证测试中，其准确率和马修斯相关系数分别为0.90和0.80，在独立测试中，它们分别为0.86和0.71。PON-P2在10折交叉验证中的覆盖率为61.7%，在测试数据集中为62.1%。PON-P2是筛选有害变异以及对实验表征进行排名和优先级排序的强大工具。它速度非常快，能够分析大型变异数据集。可在http://structure.bmc.lu.se/PON-P2/免费获取PON-P2。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/029a/4315405/a464aa0bd793/pone.0117380.g001.jpg

相似文献

PON-P2: prediction method for fast and reliable identification of harmful variants.

PLoS One. 2015 Feb 3;10(2):e0117380. doi: 10.1371/journal.pone.0117380. eCollection 2015.

Classification of Amino Acid Substitutions in Mismatch Repair Proteins Using PON-MMR2.

Hum Mutat. 2015 Dec;36(12):1128-34. doi: 10.1002/humu.22900. Epub 2015 Sep 22.

PON-Sol: prediction of effects of amino acid substitutions on protein solubility.

Bioinformatics. 2016 Jul 1;32(13):2032-4. doi: 10.1093/bioinformatics/btw066. Epub 2016 Feb 19.

PON-All: Amino Acid Substitution Tolerance Predictor for All Organisms.

Front Mol Biosci. 2022 Jun 16;9:867572. doi: 10.3389/fmolb.2022.867572. eCollection 2022.

Variation benchmark datasets: update, criteria, quality and applications.

Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baz117.

PON-Sol2: Prediction of Effects of Variants on Protein Solubility.

Int J Mol Sci. 2021 Jul 27;22(15):8027. doi: 10.3390/ijms22158027.

KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily.

BMC Genomics. 2016 Jun 23;17 Suppl 2(Suppl 2):396. doi: 10.1186/s12864-016-2723-1.

PON-SC - program for identifying steric clashes caused by amino acid substitutions.

BMC Bioinformatics. 2017 Nov 29;18(1):531. doi: 10.1186/s12859-017-1947-7.

Predicting Severity of Disease-Causing Variants.

Hum Mutat. 2017 Apr;38(4):357-364. doi: 10.1002/humu.23173. Epub 2017 Jan 24.

PON-tstab: Protein Variant Stability Predictor. Importance of Training Data Quality.

Int J Mol Sci. 2018 Mar 28;19(4):1009. doi: 10.3390/ijms19041009.

引用本文的文献

HTSNPedia: A Molecular Perspective and Risk Estimator Database for Hypertension-Associated Genes.

Biochem Genet. 2025 Aug 25. doi: 10.1007/s10528-025-11232-x.

Computational association in parkinson's disease SNPs with brain structural and functional alterations.

Neurogenetics. 2025 Aug 9;26(1):59. doi: 10.1007/s10048-025-00843-6.

Comparison of genotypes and phenotypes for von Willebrand factor gene variants using Japanese genome database.

Blood Vessel Thromb Hemost. 2025 Apr 10;2(3):100070. doi: 10.1016/j.bvth.2025.100070. eCollection 2025 Aug.

Prediction of pathogenic mutations in human transmembrane proteins and their associated diseases via utilizing pre-trained Bio-LLMs.

Commun Biol. 2025 Jul 15;8(1):1050. doi: 10.1038/s42003-025-08452-7.

BTKbase, Bruton Tyrosine Kinase Variant Database in X-Linked Agammaglobulinemia: Looking Back and Ahead.

Hum Mutat. 2023 Jul 31;2023:5797541. doi: 10.1155/2023/5797541. eCollection 2023.

PON-P3: Accurate Prediction of Pathogenicity of Amino Acid Substitutions.

Int J Mol Sci. 2025 Feb 25;26(5):2004. doi: 10.3390/ijms26052004.

Computational and molecular insights on non-synonymous SNPs associated with human RAAS genes: Consequences for Hypertension vulnerability.

J Genet Eng Biotechnol. 2025 Mar;23(1):100476. doi: 10.1016/j.jgeb.2025.100476. Epub 2025 Mar 5.

XGBMUT: Predicting the Functional Impact of Missense Mutations Using an Extreme Gradient Boost Classifier.

ACS Omega. 2025 Feb 19;10(8):8349-8360. doi: 10.1021/acsomega.4c10179. eCollection 2025 Mar 4.

Navigating Uncertainty: Assessing Variants of Uncertain Significance in the CDKL5 Gene for Developmental and Epileptic Encephalopathy Using In Silico Prediction Tools and Computational Analysis.

J Mol Neurosci. 2025 Feb 13;75(1):19. doi: 10.1007/s12031-024-02299-z.

An augmented transformer model trained on protein family specific variant data leads to improved prediction of variants of uncertain significance.

Hum Genet. 2025 Mar;144(2-3):143-158. doi: 10.1007/s00439-025-02727-z. Epub 2025 Jan 27.

本文引用的文献

Enrichment of LOVD-USHbases with 152 USH2A genotypes defines an extensive mutational spectrum and highlights missense hotspots.

Hum Mutat. 2014 Oct;35(10):1179-86. doi: 10.1002/humu.22608. Epub 2014 Jul 15.

MutationTaster2: mutation prediction for the deep-sequencing age.

Nat Methods. 2014 Apr;11(4):361-2. doi: 10.1038/nmeth.2890.

A general framework for estimating the relative pathogenicity of human genetic variants.

Nat Genet. 2014 Mar;46(3):310-5. doi: 10.1038/ng.2892. Epub 2014 Feb 2.

Pfam: the protein families database.

Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30. doi: 10.1093/nar/gkt1223. Epub 2013 Nov 27.

Variation Ontology for annotation of variation effects and mechanisms.

Genome Res. 2014 Feb;24(2):356-64. doi: 10.1101/gr.157495.113. Epub 2013 Oct 25.

The role of balanced training and testing data sets for binary classifiers in bioinformatics.

PLoS One. 2013 Jul 9;8(7):e67863. doi: 10.1371/journal.pone.0067863. Print 2013.

Assessment of computational methods for predicting the effects of missense mutations in human cancers.

BMC Genomics. 2013;14 Suppl 3(Suppl 3):S7. doi: 10.1186/1471-2164-14-S3-S7. Epub 2013 May 28.

Ensembl 2013.

Nucleic Acids Res. 2013 Jan;41(Database issue):D48-55. doi: 10.1093/nar/gks1236. Epub 2012 Nov 30.

Guidelines for reporting and using prediction tools for genetic variation analysis.

Hum Mutat. 2013 Feb;34(2):275-82. doi: 10.1002/humu.22253. Epub 2013 Jan 18.

Predicting the functional effect of amino acid substitutions and indels.

PLoS One. 2012;7(10):e46688. doi: 10.1371/journal.pone.0046688. Epub 2012 Oct 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

PON-P2：快速可靠识别有害变异的预测方法

PON-P2: prediction method for fast and reliable identification of harmful variants.

作者信息

Niroula Abhishek, Urolagin Siddhaling, Vihinen Mauno

机构信息

Department of Experimental Medical Science, Lund University, Lund, Sweden.

出版信息

PLoS One. 2015 Feb 3;10(2):e0117380. doi: 10.1371/journal.pone.0117380. eCollection 2015.

DOI:10.1371/journal.pone.0117380

PMID:25647319

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4315405/

Abstract

摘要

PON-P2：快速可靠识别有害变异的预测方法

PON-P2: prediction method for fast and reliable identification of harmful variants.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

PON-P2：快速可靠识别有害变异的预测方法

PON-P2: prediction method for fast and reliable identification of harmful variants.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献