基于功能分类的蛋白质序列概率注释。

Probabilistic annotation of protein sequences based on functional classifications.

作者信息

Levy Emmanuel D, Ouzounis Christos A, Gilks Walter R, Audit Benjamin

机构信息

Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK.

出版信息

BMC Bioinformatics. 2005 Dec 14;6:302. doi: 10.1186/1471-2105-6-302.

DOI:10.1186/1471-2105-6-302

PMID:16354297

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1361783/

Abstract

BACKGROUND

One of the most evident achievements of bioinformatics is the development of methods that transfer biological knowledge from characterised proteins to uncharacterised sequences. This mode of protein function assignment is mostly based on the detection of sequence similarity and the premise that functional properties are conserved during evolution. Most automatic approaches developed to date rely on the identification of clusters of homologous proteins and the mapping of new proteins onto these clusters, which are expected to share functional characteristics.

RESULTS

Here, we inverse the logic of this process, by considering the mapping of sequences directly to a functional classification instead of mapping functions to a sequence clustering. In this mode, the starting point is a database of labelled proteins according to a functional classification scheme, and the subsequent use of sequence similarity allows defining the membership of new proteins to these functional classes. In this framework, we define the Correspondence Indicators as measures of relationship between sequence and function and further formulate two Bayesian approaches to estimate the probability for a sequence of unknown function to belong to a functional class. This approach allows the parametrisation of different sequence search strategies and provides a direct measure of annotation error rates. We validate this approach with a database of enzymes labelled by their corresponding four-digit EC numbers and analyse specific cases.

CONCLUSION

The performance of this method is significantly higher than the simple strategy consisting in transferring the annotation from the highest scoring BLAST match and is expected to find applications in automated functional annotation pipelines.

摘要

背景

生物信息学最显著的成就之一是开发了将生物学知识从已表征的蛋白质转移到未表征序列的方法。这种蛋白质功能分配模式主要基于序列相似性的检测以及功能特性在进化过程中保守的前提。迄今为止开发的大多数自动方法都依赖于同源蛋白质簇的识别以及将新蛋白质映射到这些簇上，这些簇有望共享功能特征。

结果

在此，我们颠倒了这个过程的逻辑，通过直接将序列映射到功能分类而不是将功能映射到序列聚类。在这种模式下，起点是根据功能分类方案建立的带标签蛋白质数据库，随后利用序列相似性来确定新蛋白质属于这些功能类别的归属。在这个框架中，我们将对应指标定义为序列与功能之间关系的度量，并进一步制定了两种贝叶斯方法来估计未知功能序列属于某个功能类别的概率。这种方法允许对不同的序列搜索策略进行参数化，并提供注释错误率的直接度量。我们用一个由相应的四位数字酶委员会（EC）编号标记的酶数据库验证了这种方法，并分析了具体案例。

结论

该方法的性能明显高于简单地从得分最高的BLAST匹配转移注释的策略，有望在自动功能注释流程中得到应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0392/1361783/2f1638a7654e/1471-2105-6-302-1.jpg

相似文献

Probabilistic annotation of protein sequences based on functional classifications.基于功能分类的蛋白质序列概率注释。

BMC Bioinformatics. 2005 Dec 14;6:302. doi: 10.1186/1471-2105-6-302.

CORRIE: enzyme sequence annotation with confidence estimates.CORRIE：带有置信度估计的酶序列注释。

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S3. doi: 10.1186/1471-2105-8-S4-S3.

A sequence alignment-independent method for protein classification.一种与序列比对无关的蛋白质分类方法。

Appl Bioinformatics. 2004;3(2-3):137-48. doi: 10.2165/00822942-200403020-00008.

Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures.离散与连续蛋白质结构空间之间的交叉：对蛋白质结构自动分类及网络的见解。

PLoS Comput Biol. 2009 Mar;5(3):e1000331. doi: 10.1371/journal.pcbi.1000331. Epub 2009 Mar 27.

Application of a simple likelihood ratio approximant to protein sequence classification.一种简单似然比近似法在蛋白质序列分类中的应用。

Bioinformatics. 2006 Dec 1;22(23):2865-9. doi: 10.1093/bioinformatics/btl512. Epub 2006 Nov 7.

A structure and evolution-guided Monte Carlo sequence selection strategy for multiple alignment-based analysis of proteins.一种用于基于多序列比对的蛋白质分析的结构与进化引导的蒙特卡洛序列选择策略。

Bioinformatics. 2006 Jan 15;22(2):149-56. doi: 10.1093/bioinformatics/bti791. Epub 2005 Nov 22.

Euclidian space and grouping of biological objects.欧几里得空间与生物对象的分组

Bioinformatics. 2002 Nov;18(11):1523-34. doi: 10.1093/bioinformatics/18.11.1523.

Blast sampling for structural and functional analyses.用于结构和功能分析的胚细胞采样。

BMC Bioinformatics. 2007 Feb 23;8:62. doi: 10.1186/1471-2105-8-62.

Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes.利用两亲性伪氨基酸组成预测酶亚家族类别。

Bioinformatics. 2005 Jan 1;21(1):10-9. doi: 10.1093/bioinformatics/bth466. Epub 2004 Aug 12.

MACHOS: Markov clusters of homologous subsequences.MACHOS：同源子序列的马尔可夫聚类

Bioinformatics. 2008 Jul 1;24(13):i77-85. doi: 10.1093/bioinformatics/btn144.

引用本文的文献

Propagation, detection and correction of errors using the sequence database network.利用序列数据库网络进行错误的传播、检测和纠正。

Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac416.

Primer on the Gene Ontology.基因本体论入门

Methods Mol Biol. 2017;1446:25-37. doi: 10.1007/978-1-4939-3743-1_3.

Computational Approaches for Automated Classification of Enzyme Sequences.酶序列自动分类的计算方法

J Proteomics Bioinform. 2011 Aug 23;4:147-152. doi: 10.4172/jpb.1000183.

The what, where, how and why of gene ontology--a primer for bioinformaticians.基因本体论的是什么、在哪里、如何以及为什么——生物信息学家入门。

Brief Bioinform. 2011 Nov;12(6):723-35. doi: 10.1093/bib/bbr002. Epub 2011 Feb 17.

Accurate protein structure annotation through competitive diffusion of enzymatic functions over a network of local evolutionary similarities.通过在局部进化相似性网络上对酶功能进行竞争扩散，实现蛋白质结构的精确注释。

PLoS One. 2010 Dec 13;5(12):e14286. doi: 10.1371/journal.pone.0014286.

Sequence-based feature prediction and annotation of proteins.基于序列的蛋白质特征预测和注释。

Genome Biol. 2009 Feb 2;10(2):206. doi: 10.1186/gb-2009-10-2-206.

Towards a semi-automatic functional annotation tool based on decision-tree techniques.迈向基于决策树技术的半自动功能注释工具。

BMC Proc. 2008 Dec 17;2 Suppl 4(Suppl 4):S3. doi: 10.1186/1753-6561-2-s4-s3.

ProbCD: enrichment analysis accounting for categorization uncertainty.ProbCD：考虑分类不确定性的富集分析。

BMC Bioinformatics. 2007 Oct 12;8:383. doi: 10.1186/1471-2105-8-383.

Exploring inconsistencies in genome-wide protein function annotations: a machine learning approach.探索全基因组蛋白质功能注释中的不一致性：一种机器学习方法。

BMC Bioinformatics. 2007 Aug 3;8:284. doi: 10.1186/1471-2105-8-284.

Applying negative rule mining to improve genome annotation.应用负规则挖掘来改进基因组注释。

BMC Bioinformatics. 2007 Jul 21;8:261. doi: 10.1186/1471-2105-8-261.

本文引用的文献

A sequence alignment-independent method for protein classification.一种与序列比对无关的蛋白质分类方法。

Appl Bioinformatics. 2004;3(2-3):137-48. doi: 10.2165/00822942-200403020-00008.

Protein classification based on text document classification techniques.基于文本文档分类技术的蛋白质分类。

Proteins. 2005 Mar 1;58(4):955-70. doi: 10.1002/prot.20373.

Filtering erroneous protein annotation.过滤错误的蛋白质注释。

Bioinformatics. 2004 Aug 4;20 Suppl 1:i342-7. doi: 10.1093/bioinformatics/bth938.

Statistically rigorous automated protein annotation.统计严格的自动化蛋白质注释。

Bioinformatics. 2004 May 1;20(7):1066-73. doi: 10.1093/bioinformatics/bth039. Epub 2004 Feb 5.

Automatic annotation of protein function based on family identification.基于家族识别的蛋白质功能自动注释。

Proteins. 2003 Nov 15;53(3):683-92. doi: 10.1002/prot.10449.

Automated annotation of microbial proteomes in SWISS-PROT.SWISS-PROT中微生物蛋白质组的自动注释。

Comput Biol Chem. 2003 Feb;27(1):49-58. doi: 10.1016/s1476-9271(02)00094-4.

Beyond 100 genomes.超过100个基因组。

Genome Biol. 2003;4(5):402. doi: 10.1186/gb-2003-4-5-402. Epub 2003 Apr 28.

Alignment-free sequence comparison-a review.无比对序列比较——综述

Bioinformatics. 2003 Mar 1;19(4):513-23. doi: 10.1093/bioinformatics/btg005.

The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.2003年的SWISS-PROT蛋白质知识库及其补充TrEMBL。

Nucleic Acids Res. 2003 Jan 1;31(1):365-70. doi: 10.1093/nar/gkg095.

Modeling the percolation of annotation errors in a database of protein sequences.蛋白质序列数据库中注释错误的渗流建模。

Bioinformatics. 2002 Dec;18(12):1641-9. doi: 10.1093/bioinformatics/18.12.1641.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于功能分类的蛋白质序列概率注释。

Probabilistic annotation of protein sequences based on functional classifications.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献