Suppr超能文献

一种基于基因本体术语的用于蛋白质-蛋白质相互作用预测的新特征向量

A New Feature Vector Based on Gene Ontology Terms for Protein-Protein Interaction Prediction.

作者信息

Bandyopadhyay Sanghamitra, Mallick Koushik

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2017 Jul-Aug;14(4):762-770. doi: 10.1109/TCBB.2016.2555304. Epub 2016 Apr 20.

Abstract

Protein-protein interaction (PPI) plays a key role in understanding cellular mechanisms in different organisms. Many supervised classifiers like Random Forest (RF) and Support Vector Machine (SVM) have been used for intra or inter-species interaction prediction. For improving the prediction performance, in this paper we propose a novel set of features to represent a protein pair using their annotated Gene Ontology (GO) terms, including their ancestors. In our approach, a protein pair is treated as a document (bag of words), where the terms annotating the two proteins represent the words. Feature value of each word is calculated using information content of the corresponding term multiplied by a coefficient, which represents the weight of that term inside a document (i.e., a protein pair). We have tested the performance of the classifier using the proposed feature on different well known data sets of different species like S. cerevisiae, H. Sapiens, E. Coli, and D. melanogaster. We compare it with the other GO based feature representation technique, and demonstrate its competitive performance.

摘要

蛋白质-蛋白质相互作用(PPI)在理解不同生物体的细胞机制中起着关键作用。许多有监督分类器,如随机森林(RF)和支持向量机(SVM),已被用于种内或种间相互作用预测。为了提高预测性能,在本文中我们提出了一组新的特征,使用蛋白质对的带注释的基因本体(GO)术语(包括其祖先)来表示蛋白质对。在我们的方法中,将蛋白质对视为一个文档(词袋),其中注释这两个蛋白质的术语代表单词。每个单词的特征值通过将相应术语的信息内容乘以一个系数来计算,该系数表示该术语在文档(即蛋白质对)中的权重。我们使用所提出的特征在不同物种(如酿酒酵母、智人、大肠杆菌和黑腹果蝇)的不同知名数据集上测试了分类器的性能。我们将其与其他基于GO的特征表示技术进行比较,并展示了其具有竞争力的性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验