Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong.
Bioinformatics Laboratory and National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101.
Proteome Sci. 2011 Oct 14;9 Suppl 1(Suppl 1):S18. doi: 10.1186/1477-5956-9-S1-S18.
Identifying biologically relevant protein complexes from a large protein-protein interaction (PPI) network, is essential to understand the organization of biological systems. However, high-throughput experimental techniques that can produce a large amount of PPIs are known to yield non-negligible rates of false-positives and false-negatives, making the protein complexes difficult to be identified.
We propose a binary matrix factorization (BMF) algorithm under the Bayesian Ying-Yang (BYY) harmony learning, to detect protein complexes by clustering the proteins which share similar interactions through factorizing the binary adjacent matrix of a PPI network. The proposed BYY-BMF algorithm automatically determines the cluster number while this number is pre-given for most existing BMF algorithms. Also, BYY-BMF's clustering results does not depend on any parameters or thresholds, unlike the Markov Cluster Algorithm (MCL) that relies on a so-called inflation parameter. On synthetic PPI networks, the predictions evaluated by the known annotated complexes indicate that BYY-BMF is more robust than MCL for most cases. On real PPI networks from the MIPS and DIP databases, BYY-BMF obtains a better balanced prediction accuracies than MCL and a spectral analysis method, while MCL has its own advantages, e.g., with good separation values.
从大型蛋白质相互作用(PPI)网络中识别具有生物学意义的蛋白质复合物对于理解生物系统的组织至关重要。然而,能够产生大量 PPI 的高通量实验技术已知会产生不可忽略的假阳性和假阴性率,使得蛋白质复合物难以识别。
我们提出了一种基于贝叶斯阴阳(BYY)和谐学习的二值矩阵分解(BMF)算法,通过对 PPI 网络的二值邻接矩阵进行因子分解,对具有相似相互作用的蛋白质进行聚类,从而检测蛋白质复合物。所提出的 BYY-BMF 算法在大多数现有 BMF 算法中预先给定聚类数量的情况下自动确定聚类数量。此外,BYY-BMF 的聚类结果不依赖于任何参数或阈值,而与依赖于所谓的膨胀参数的 Markov 聚类算法(MCL)不同。在合成 PPI 网络上,通过已知注释复合物评估的预测表明,对于大多数情况,BYY-BMF 比 MCL 更稳健。在来自 MIPS 和 DIP 数据库的真实 PPI 网络上,BYY-BMF 获得了比 MCL 和光谱分析方法更好的平衡预测准确性,而 MCL 有其自身的优势,例如具有良好的分离值。