Suppr超能文献

利用混合特性鉴定蛋白质复合物。

Identifying protein complexes using hybrid properties.

机构信息

Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai 200062, People's Republic of China.

出版信息

J Proteome Res. 2009 Nov;8(11):5212-8. doi: 10.1021/pr900554a.

Abstract

Protein complexes, integrating multiple gene products, perform all sorts of fundamental biological functions in cells. Much effort has been put into identifying protein complexes using computational approaches. A vast majority attempt to research densely connected regions in protein-protein interaction (PPI) network/graph. In this research, we try an alterative approach to analyze protein complexes using hybrid features and present a method to determine whether multiple (more than two) proteins from yeast can form a protein complex. The data set consists of 493 positive protein complexes and 9878 negative protein complexes. Every complex is represented by graph features, where proteins in the complex form a graph (web) of interactions, and features derived from biological properties including protein length, biochemical properties and physicochemical properties. These features are filtered and optimized by Minimum Redundancy Maximum Relevance method, Incremental Feature Selection and Forward Feature Selection, established through a prediction/identification model called Nearest Neighbor Algorithm. Jackknife cross-validation test is employed to evaluate the identification accuracy. As a result, the highest accuracy for the identification of the real protein complexes using filtered features is 69.17%, and feature analysis shows that, among the adopted features, graph features play the main roles in the determination of protein complexes.

摘要

蛋白质复合物整合了多种基因产物,在细胞中执行各种基本的生物学功能。人们已经投入了大量的精力来使用计算方法来识别蛋白质复合物。绝大多数尝试都是研究蛋白质-蛋白质相互作用(PPI)网络/图中的密集连接区域。在这项研究中,我们尝试使用混合特征来分析蛋白质复合物,并提出了一种方法来确定酵母中的多个(两个以上)蛋白质是否可以形成蛋白质复合物。数据集由 493 个阳性蛋白质复合物和 9878 个阴性蛋白质复合物组成。每个复合物都由图形特征表示,其中复合物中的蛋白质形成相互作用的图形(网络),特征来自生物性质,包括蛋白质长度、生化性质和物理化学性质。这些特征通过最小冗余最大相关性方法、增量特征选择和前向特征选择进行过滤和优化,通过称为最近邻算法的预测/识别模型建立。Jackknife 交叉验证测试用于评估识别准确性。结果表明,使用过滤特征识别真实蛋白质复合物的最高准确性为 69.17%,特征分析表明,在所采用的特征中,图形特征在确定蛋白质复合物方面起着主要作用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验