Suppr
超能文献

将功能相互关系纳入蛋白质功能预测算法。

Incorporating functional inter-relationships into protein function prediction algorithms.

作者信息

Pandey Gaurav, Myers Chad L, Kumar Vipin

机构信息

Department of Computer Science & Engineering, University of Minnesota, Minneapolis, MN, USA.

出版信息

BMC Bioinformatics. 2009 May 12;10:142. doi: 10.1186/1471-2105-10-142.

DOI:10.1186/1471-2105-10-142

PMID:19435516

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2693438/

Abstract

BACKGROUND

Functional classification schemes (e.g. the Gene Ontology) that serve as the basis for annotation efforts in several organisms are often the source of gold standard information for computational efforts at supervised protein function prediction. While successful function prediction algorithms have been developed, few previous efforts have utilized more than the protein-to-functional class label information provided by such knowledge bases. For instance, the Gene Ontology not only captures protein annotations to a set of functional classes, but it also arranges these classes in a DAG-based hierarchy that captures rich inter-relationships between different classes. These inter-relationships present both opportunities, such as the potential for additional training examples for small classes from larger related classes, and challenges, such as a harder to learn distinction between similar GO terms, for standard classification-based approaches.

RESULTS

We propose a method to enhance the performance of classification-based protein function prediction algorithms by addressing the issue of using these interrelationships between functional classes constituting functional classification schemes. Using a standard measure for evaluating the semantic similarity between nodes in an ontology, we quantify and incorporate these inter-relationships into the k-nearest neighbor classifier. We present experiments on several large genomic data sets, each of which is used for the modeling and prediction of over hundred classes from the GO Biological Process ontology. The results show that this incorporation produces more accurate predictions for a large number of the functional classes considered, and also that the classes benefitted most by this approach are those containing the fewest members. In addition, we show how our proposed framework can be used for integrating information from the entire GO hierarchy for improving the accuracy of predictions made over a set of base classes. Finally, we provide qualitative and quantitative evidence that this incorporation of functional inter-relationships enables the discovery of interesting biology in the form of novel functional annotations for several yeast proteins, such as Sna4, Rtn1 and Lin1.

CONCLUSION

We implemented and evaluated a methodology for incorporating interrelationships between functional classes into a standard classification-based protein function prediction algorithm. Our results show that this incorporation can help improve the accuracy of such algorithms, and help uncover novel biology in the form of previously unknown functional annotations. The complete source code, a sample data set and the additional files for this paper are available free of charge for non-commercial use at http://www.cs.umn.edu/vk/gaurav/functionalsimilarity/.

摘要

背景

功能分类方案（如基因本体论）是多种生物体注释工作的基础，通常是监督蛋白质功能预测计算工作的金标准信息来源。虽然已经开发出了成功的功能预测算法，但之前很少有工作利用这些知识库提供的蛋白质到功能类别标签信息之外的信息。例如，基因本体论不仅捕获了蛋白质到一组功能类别的注释，还将这些类别排列在基于有向无环图（DAG）的层次结构中，该层次结构捕获了不同类别之间丰富的相互关系。这些相互关系既带来了机遇，比如从小的相关类别中为小类别获取更多训练示例的可能性，也带来了挑战，比如对于基于标准分类的方法而言，更难区分相似的基因本体术语。

结果

我们提出了一种方法，通过解决在构成功能分类方案的功能类别之间利用这些相互关系的问题，来提高基于分类的蛋白质功能预测算法的性能。使用一种评估本体中节点之间语义相似性的标准方法，我们对这些相互关系进行量化并将其纳入k近邻分类器。我们在几个大型基因组数据集上进行了实验，每个数据集都用于从基因本体生物学过程本体中对一百多个类别进行建模和预测。结果表明，这种纳入方式对大量考虑的功能类别产生了更准确的预测，而且受此方法受益最大的类别是那些成员最少的类别。此外，我们展示了我们提出的框架如何用于整合来自整个基因本体层次结构的信息，以提高对一组基础类别的预测准确性。最后，我们提供了定性和定量证据，证明这种功能相互关系的纳入能够以对几种酵母蛋白（如Sna4、Rtn1和Lin1）的新功能注释的形式发现有趣的生物学现象。

结论

我们实现并评估了一种将功能类别之间的相互关系纳入基于标准分类的蛋白质功能预测算法的方法。我们的结果表明，这种纳入有助于提高此类算法的准确性，并有助于以先前未知的功能注释形式揭示新的生物学现象。本文的完整源代码、一个示例数据集以及其他文件可在http://www.cs.umn.edu/vk/gaurav/functionalsimilarity/上免费获取，供非商业使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44c9/2693438/fa92903d47ef/1471-2105-10-142-1.jpg

相似文献

Incorporating functional inter-relationships into protein function prediction algorithms.

BMC Bioinformatics. 2009 May 12;10:142. doi: 10.1186/1471-2105-10-142.

Information theory applied to the sparse gene ontology annotation network to predict novel gene function.

Bioinformatics. 2007 Jul 1;23(13):i529-38. doi: 10.1093/bioinformatics/btm195.

A relation based measure of semantic similarity for Gene Ontology annotations.

BMC Bioinformatics. 2008 Nov 4;9:468. doi: 10.1186/1471-2105-9-468.

A framework for incorporating functional interrelationships into protein function prediction algorithms.

IEEE/ACM Trans Comput Biol Bioinform. 2012 May-Jun;9(3):740-53. doi: 10.1109/TCBB.2011.148.

Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction.

BMC Bioinformatics. 2013 Sep 26;14:285. doi: 10.1186/1471-2105-14-285.

Globally predicting protein functions based on co-expressed protein-protein interaction networks and ontology taxonomy similarities.

Gene. 2007 Apr 15;391(1-2):113-9. doi: 10.1016/j.gene.2006.12.008. Epub 2006 Dec 22.

The use of gene ontology evidence codes in preventing classifier assessment bias.

Bioinformatics. 2009 May 1;25(9):1173-7. doi: 10.1093/bioinformatics/btp122. Epub 2009 Mar 2.

AVID: an integrative framework for discovering functional relationships among proteins.

BMC Bioinformatics. 2005 Jun 1;6:136. doi: 10.1186/1471-2105-6-136.

Measuring semantic similarities by combining gene ontology annotations and gene co-function networks.

BMC Bioinformatics. 2015 Feb 14;16:44. doi: 10.1186/s12859-015-0474-7.

引用本文的文献

Predicting functions of maize proteins using graph convolutional network.

BMC Bioinformatics. 2020 Dec 16;21(Suppl 16):420. doi: 10.1186/s12859-020-03745-6.

A Literature Review of Gene Function Prediction by Modeling Gene Ontology.

Front Genet. 2020 Apr 24;11:400. doi: 10.3389/fgene.2020.00400. eCollection 2020.

Large-scale protein function prediction using heterogeneous ensembles.

F1000Res. 2018 Sep 28;7. doi: 10.12688/f1000research.16415.1. eCollection 2018.

The effects of shared information on semantic calculations in the gene ontology.

Comput Struct Biotechnol J. 2017 Jan 30;15:195-211. doi: 10.1016/j.csbj.2017.01.009. eCollection 2017.

Predicting protein function via downward random walks on a gene ontology.

BMC Bioinformatics. 2015 Aug 27;16:271. doi: 10.1186/s12859-015-0713-y.

Hierarchical ensemble methods for protein function prediction.

ISRN Bioinform. 2014 May 4;2014:901419. doi: 10.1155/2014/901419. eCollection 2014.

Integrating multiple networks for protein function prediction.

BMC Syst Biol. 2015;9 Suppl 1(Suppl 1):S3. doi: 10.1186/1752-0509-9-S1-S3. Epub 2015 Jan 21.

Predicting protein functions using incomplete hierarchical labels.

BMC Bioinformatics. 2015 Jan 16;16:1. doi: 10.1186/s12859-014-0430-y.

Semi-supervised multi-label collective classification ensemble for functional genomics.

BMC Genomics. 2014;15 Suppl 9(Suppl 9):S17. doi: 10.1186/1471-2164-15-S9-S17. Epub 2014 Dec 8.

Determining microbial products and identifying molecular targets in the human microbiome.

Cell Metab. 2014 Nov 4;20(5):731-741. doi: 10.1016/j.cmet.2014.10.003.

本文引用的文献

Incorporating Ontology-Driven Similarity Knowledge into Functional Genomics: An Exploratory Study.

BIBE 2004. 2004 May;2004:317-324. doi: 10.1109/BIBE.2004.1317360.

An in vivo map of the yeast protein interactome.

Science. 2008 Jun 13;320(5882):1465-70. doi: 10.1126/science.1153878. Epub 2008 May 8.

Information theory applied to the sparse gene ontology annotation network to predict novel gene function.

Bioinformatics. 2007 Jul 1;23(13):i529-38. doi: 10.1093/bioinformatics/btm195.

The position of yeast snoRNA-coding regions within host introns is essential for their biosynthesis and for efficient splicing of the host pre-mRNA.

RNA. 2007 Jan;13(1):138-50. doi: 10.1261/rna.251907. Epub 2006 Nov 29.

Correlation between gene expression and GO semantic similarity.

IEEE/ACM Trans Comput Biol Bioinform. 2005 Oct-Dec;2(4):330-8. doi: 10.1109/TCBB.2005.50.

Gene function classification using Bayesian models with hierarchy-based priors.

BMC Bioinformatics. 2006 Oct 12;7:448. doi: 10.1186/1471-2105-7-448.

Finding function: evaluation methods for functional genomic data.

BMC Genomics. 2006 Jul 25;7:187. doi: 10.1186/1471-2164-7-187.

A regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data.

BMC Bioinformatics. 2006 Mar 20;7 Suppl 1(Suppl 1):S11. doi: 10.1186/1471-2105-7-S1-S11.

Protein classification using probabilistic chain graphs and the Gene Ontology structure.

Bioinformatics. 2006 Aug 1;22(15):1871-8. doi: 10.1093/bioinformatics/btl187. Epub 2006 May 16.

Global landscape of protein complexes in the yeast Saccharomyces cerevisiae.

Nature. 2006 Mar 30;440(7084):637-43. doi: 10.1038/nature04670. Epub 2006 Mar 22.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

将功能相互关系纳入蛋白质功能预测算法。

Incorporating functional inter-relationships into protein function prediction algorithms.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译