Suppr超能文献

将功能相互关系纳入蛋白质功能预测算法。

Incorporating functional inter-relationships into protein function prediction algorithms.

作者信息

Pandey Gaurav, Myers Chad L, Kumar Vipin

机构信息

Department of Computer Science & Engineering, University of Minnesota, Minneapolis, MN, USA.

出版信息

BMC Bioinformatics. 2009 May 12;10:142. doi: 10.1186/1471-2105-10-142.

Abstract

BACKGROUND

Functional classification schemes (e.g. the Gene Ontology) that serve as the basis for annotation efforts in several organisms are often the source of gold standard information for computational efforts at supervised protein function prediction. While successful function prediction algorithms have been developed, few previous efforts have utilized more than the protein-to-functional class label information provided by such knowledge bases. For instance, the Gene Ontology not only captures protein annotations to a set of functional classes, but it also arranges these classes in a DAG-based hierarchy that captures rich inter-relationships between different classes. These inter-relationships present both opportunities, such as the potential for additional training examples for small classes from larger related classes, and challenges, such as a harder to learn distinction between similar GO terms, for standard classification-based approaches.

RESULTS

We propose a method to enhance the performance of classification-based protein function prediction algorithms by addressing the issue of using these interrelationships between functional classes constituting functional classification schemes. Using a standard measure for evaluating the semantic similarity between nodes in an ontology, we quantify and incorporate these inter-relationships into the k-nearest neighbor classifier. We present experiments on several large genomic data sets, each of which is used for the modeling and prediction of over hundred classes from the GO Biological Process ontology. The results show that this incorporation produces more accurate predictions for a large number of the functional classes considered, and also that the classes benefitted most by this approach are those containing the fewest members. In addition, we show how our proposed framework can be used for integrating information from the entire GO hierarchy for improving the accuracy of predictions made over a set of base classes. Finally, we provide qualitative and quantitative evidence that this incorporation of functional inter-relationships enables the discovery of interesting biology in the form of novel functional annotations for several yeast proteins, such as Sna4, Rtn1 and Lin1.

CONCLUSION

We implemented and evaluated a methodology for incorporating interrelationships between functional classes into a standard classification-based protein function prediction algorithm. Our results show that this incorporation can help improve the accuracy of such algorithms, and help uncover novel biology in the form of previously unknown functional annotations. The complete source code, a sample data set and the additional files for this paper are available free of charge for non-commercial use at http://www.cs.umn.edu/vk/gaurav/functionalsimilarity/.

摘要

背景

功能分类方案(如基因本体论)是多种生物体注释工作的基础,通常是监督蛋白质功能预测计算工作的金标准信息来源。虽然已经开发出了成功的功能预测算法,但之前很少有工作利用这些知识库提供的蛋白质到功能类别标签信息之外的信息。例如,基因本体论不仅捕获了蛋白质到一组功能类别的注释,还将这些类别排列在基于有向无环图(DAG)的层次结构中,该层次结构捕获了不同类别之间丰富的相互关系。这些相互关系既带来了机遇,比如从小的相关类别中为小类别获取更多训练示例的可能性,也带来了挑战,比如对于基于标准分类的方法而言,更难区分相似的基因本体术语。

结果

我们提出了一种方法,通过解决在构成功能分类方案的功能类别之间利用这些相互关系的问题,来提高基于分类的蛋白质功能预测算法的性能。使用一种评估本体中节点之间语义相似性的标准方法,我们对这些相互关系进行量化并将其纳入k近邻分类器。我们在几个大型基因组数据集上进行了实验,每个数据集都用于从基因本体生物学过程本体中对一百多个类别进行建模和预测。结果表明,这种纳入方式对大量考虑的功能类别产生了更准确的预测,而且受此方法受益最大的类别是那些成员最少的类别。此外,我们展示了我们提出的框架如何用于整合来自整个基因本体层次结构的信息,以提高对一组基础类别的预测准确性。最后,我们提供了定性和定量证据,证明这种功能相互关系的纳入能够以对几种酵母蛋白(如Sna4、Rtn1和Lin1)的新功能注释的形式发现有趣的生物学现象。

结论

我们实现并评估了一种将功能类别之间的相互关系纳入基于标准分类的蛋白质功能预测算法的方法。我们的结果表明,这种纳入有助于提高此类算法的准确性,并有助于以先前未知的功能注释形式揭示新的生物学现象。本文的完整源代码、一个示例数据集以及其他文件可在http://www.cs.umn.edu/vk/gaurav/functionalsimilarity/上免费获取,供非商业使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44c9/2693438/fa92903d47ef/1471-2105-10-142-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验