使用密码子水平的估计进化强度可改善对人类疾病相关蛋白质突变的预测。

Use of estimated evolutionary strength at the codon level improves the prediction of disease-related protein mutations in humans.

作者信息

Capriotti Emidio, Arbiza Leonardo, Casadio Rita, Dopazo Joaquín, Dopazo Hernán, Marti-Renom Marc A

机构信息

Structural Genomics Unit, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain.

出版信息

Hum Mutat. 2008 Jan;29(1):198-204. doi: 10.1002/humu.20628.

DOI:10.1002/humu.20628

PMID:17935148

Abstract

Predicting the functional impact of protein variation is one of the most challenging problems in bioinformatics. A rapidly growing number of genome-scale studies provide large amounts of experimental data, allowing the application of rigorous statistical approaches for predicting whether a given single point mutation has an impact on human health. Up until now, existing methods have limited their source data to either protein or gene information. Novel in this work, we take advantage of both and focus on protein evolutionary information by using estimated selective pressures at the codon level. Here we introduce a new method (SeqProfCod) to predict the likelihood that a given protein variant is associated with human disease or not. Our method relies on a support vector machine (SVM) classifier trained using three sources of information: protein sequence, multiple protein sequence alignments, and the estimation of selective pressure at the codon level. SeqProfCod has been benchmarked with a large dataset of 8,987 single point mutations from 1,434 human proteins from SWISS-PROT. It achieves 82% overall accuracy and a correlation coefficient of 0.59, indicating that the estimation of the selective pressure helps in predicting the functional impact of single-point mutations. Moreover, this study demonstrates the synergic effect of combining two sources of information for predicting the functional effects of protein variants: protein sequence/profile-based information and the evolutionary estimation of the selective pressures at the codon level. The results of large-scale application of SeqProfCod over all annotated point mutations in SWISS-PROT (available for download at http://sgu.bioinfo.cipf.es/services/Omidios/; last accessed: 24 August 2007), could be used to support clinical studies.

摘要

预测蛋白质变异的功能影响是生物信息学中最具挑战性的问题之一。越来越多的基因组规模研究提供了大量实验数据，使得应用严格的统计方法来预测给定的单点突变是否对人类健康有影响成为可能。到目前为止，现有方法将其源数据限制在蛋白质或基因信息之一。本研究的新颖之处在于，我们同时利用了这两者，并通过使用密码子水平的估计选择压力来关注蛋白质进化信息。在这里，我们介绍一种新方法（SeqProfCod）来预测给定蛋白质变体与人类疾病相关的可能性。我们的方法依赖于一个支持向量机（SVM）分类器，该分类器使用三种信息源进行训练：蛋白质序列、多蛋白质序列比对以及密码子水平的选择压力估计。SeqProfCod已使用来自SWISS-PROT的1434个人类蛋白质的8987个单点突变的大型数据集进行了基准测试。它实现了82%的总体准确率和0.59的相关系数，表明选择压力的估计有助于预测单点突变的功能影响。此外，本研究证明了结合两种信息源来预测蛋白质变体功能影响的协同效应：基于蛋白质序列/图谱的信息以及密码子水平选择压力的进化估计。SeqProfCod在SWISS-PROT中所有注释的点突变上的大规模应用结果（可从http://sgu.bioinfo.cipf.es/services/Omidios/下载；最后访问时间：2007年8月24日）可用于支持临床研究。

相似文献

Use of estimated evolutionary strength at the codon level improves the prediction of disease-related protein mutations in humans.

Hum Mutat. 2008 Jan;29(1):198-204. doi: 10.1002/humu.20628.

Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information.

Bioinformatics. 2006 Nov 15;22(22):2729-34. doi: 10.1093/bioinformatics/btl423. Epub 2006 Aug 7.

Selective pressures at a codon-level predict deleterious mutations in human disease genes.

J Mol Biol. 2006 May 19;358(5):1390-404. doi: 10.1016/j.jmb.2006.02.067. Epub 2006 Mar 15.

Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information.

Bioinformatics. 2005 May 15;21(10):2185-90. doi: 10.1093/bioinformatics/bti365. Epub 2005 Mar 3.

Detailed computational study of p53 and p16: using evolutionary sequence analysis and disease-associated mutations to predict the functional consequences of allelic variants.

Oncogene. 2003 Feb 27;22(8):1150-63. doi: 10.1038/sj.onc.1206101.

Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers.

Proteins. 2008 Jun;71(4):1930-9. doi: 10.1002/prot.21838.

Prediction of function changes associated with single-point protein mutations using support vector machines (SVMs).

Hum Mutat. 2009 Aug;30(8):1161-6. doi: 10.1002/humu.21039.

Cataloging coding sequence variations in human genome databases.

PLoS One. 2008;3(10):e3575. doi: 10.1371/journal.pone.0003575. Epub 2008 Oct 30.

Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction.

BMC Bioinformatics. 2006 Nov 7;7:491. doi: 10.1186/1471-2105-7-491.

DPROT: prediction of disordered proteins using evolutionary information.

Amino Acids. 2008 Oct;35(3):599-605. doi: 10.1007/s00726-008-0085-y. Epub 2008 Apr 19.

引用本文的文献

Translating cancer genomics into precision medicine with artificial intelligence: applications, challenges and future perspectives.

Hum Genet. 2019 Feb;138(2):109-124. doi: 10.1007/s00439-019-01970-5. Epub 2019 Jan 22.

PMut: a web-based tool for the annotation of pathological variants on proteins, 2017 update.

Nucleic Acids Res. 2017 Jul 3;45(W1):W222-W228. doi: 10.1093/nar/gkx313.

Computational methods and resources for the interpretation of genomic variants in cancer.

BMC Genomics. 2015;16 Suppl 8(Suppl 8):S7. doi: 10.1186/1471-2164-16-S8-S7. Epub 2015 Jun 18.

PON-P2: prediction method for fast and reliable identification of harmful variants.

PLoS One. 2015 Feb 3;10(2):e0117380. doi: 10.1371/journal.pone.0117380. eCollection 2015.

The Clinical Significance of Unknown Sequence Variants in BRCA Genes.

Cancers (Basel). 2010 Sep 10;2(3):1644-60. doi: 10.3390/cancers2031644.

The role of balanced training and testing data sets for binary classifiers in bioinformatics.

PLoS One. 2013 Jul 9;8(7):e67863. doi: 10.1371/journal.pone.0067863. Print 2013.

Collective judgment predicts disease-associated single nucleotide variants.

BMC Genomics. 2013;14 Suppl 3(Suppl 3):S2. doi: 10.1186/1471-2164-14-S3-S2. Epub 2013 May 28.

WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation.

BMC Genomics. 2013;14 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2164-14-S3-S6. Epub 2013 May 28.

VARIANT: Command Line, Web service and Web interface for fast and accurate functional characterization of variants found by Next-Generation Sequencing.

Nucleic Acids Res. 2012 Jul;40(Web Server issue):W54-8. doi: 10.1093/nar/gks572. Epub 2012 Jun 11.

Bioinformatics and variability in drug response: a protein structural perspective.

J R Soc Interface. 2012 Jul 7;9(72):1409-37. doi: 10.1098/rsif.2011.0843. Epub 2012 May 2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用密码子水平的估计进化强度可改善对人类疾病相关蛋白质突变的预测。

Use of estimated evolutionary strength at the codon level improves the prediction of disease-related protein mutations in humans.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献