Suppr超能文献

蛋白质相互作用网络中蛋白质复合物的预测——一种基于监督学习的方法。

Predicting protein complex in protein interaction network - a supervised learning based method.

作者信息

Yu Feng, Yang Zhi, Tang Nan, Lin Hong, Wang Jian, Yang Zhi

出版信息

BMC Syst Biol. 2014;8 Suppl 3(Suppl 3):S4. doi: 10.1186/1752-0509-8-S3-S4. Epub 2014 Oct 22.

Abstract

BACKGROUND

Protein complexes are important for understanding principles of cellular organization and function. High-throughput experimental techniques have produced a large amount of protein interactions, making it possible to predict protein complexes from protein -protein interaction networks. However, most of current methods are unsupervised learning based methods which can't utilize the information of the large amount of available known complexes.

METHODS

We present a supervised learning-based method for predicting protein complexes in protein - protein interaction networks. The method extracts rich features from both the unweighted and weighted networks to train a Regression model, which is then used for the cliques filtering, growth, and candidate complex filtering. The model utilizes additional "uncertainty" samples and, therefore, is more discriminative when used in the complex detection algorithm. In addition, our method uses the maximal cliques found by the Cliques algorithm as the initial cliques, which has been proven to be more effective than the method of expanding from the seeding proteins used in other methods.

RESULTS

The experimental results on several PIN datasets show that in most cases the performance of our method are superior to comparable state-of-the-art protein complex detection techniques.

CONCLUSIONS

The results demonstrate the several advantages of our method over other state-of-the-art techniques. Firstly, our method is a supervised learning-based method that can make full use of the information of the available known complexes instead of being only based on the topological structure of the PIN. That also means, if more training samples are provided, our method can achieve better performance than those unsupervised methods. Secondly, we design the rich feature set to describe the properties of the known complexes, which includes not only the features from the unweighted network, but also those from the weighted network built based on the Gene Ontology information. Thirdly, our Regression model utilizes additional "uncertainty" samples and, therefore, becomes more discriminative, whose effectiveness for the complex detection is indicated by our experimental results.

摘要

背景

蛋白质复合物对于理解细胞组织和功能原理至关重要。高通量实验技术产生了大量蛋白质相互作用数据,使得从蛋白质-蛋白质相互作用网络预测蛋白质复合物成为可能。然而,当前大多数方法是基于无监督学习的方法,无法利用大量可用已知复合物的信息。

方法

我们提出一种基于监督学习的方法来预测蛋白质-蛋白质相互作用网络中的蛋白质复合物。该方法从未加权和加权网络中提取丰富特征来训练回归模型,然后用于团簇过滤、生长和候选复合物过滤。该模型利用额外的“不确定性”样本,因此在复杂检测算法中使用时更具判别力。此外,我们的方法使用团簇算法找到的最大团作为初始团,这已被证明比其他方法中从种子蛋白扩展的方法更有效。

结果

在几个PIN数据集上的实验结果表明,在大多数情况下,我们方法的性能优于同类最先进的蛋白质复合物检测技术。

结论

结果证明了我们的方法相对于其他最先进技术的几个优势。首先,我们的方法是基于监督学习的方法,可以充分利用可用已知复合物的信息,而不仅仅基于PIN的拓扑结构。这也意味着,如果提供更多训练样本,我们的方法可以比那些无监督方法取得更好的性能。其次,我们设计了丰富的特征集来描述已知复合物的属性,其中不仅包括来自未加权网络的特征,还包括基于基因本体信息构建的加权网络的特征。第三,我们的回归模型利用额外的“不确定性”样本,因此变得更具判别力,我们的实验结果表明其对复合物检测的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2841/4243764/0912f5ec45aa/1752-0509-8-S3-S4-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验