Ma Cheng-Yu, Chen Yi-Ping Phoebe, Berger Bonnie, Liao Chung-Shou
Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.
Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, Vic, Australia.
Bioinformatics. 2017 Jun 1;33(11):1681-1688. doi: 10.1093/bioinformatics/btx043.
Protein complexes are one of the keys to studying the behavior of a cell system. Many biological functions are carried out by protein complexes. During the past decade, the main strategy used to identify protein complexes from high-throughput network data has been to extract near-cliques or highly dense subgraphs from a single protein-protein interaction (PPI) network. Although experimental PPI data have increased significantly over recent years, most PPI networks still have many false positive interactions and false negative edge loss due to the limitations of high-throughput experiments. In particular, the false negative errors restrict the search space of such conventional protein complex identification approaches. Thus, it has become one of the most challenging tasks in systems biology to automatically identify protein complexes.
In this study, we propose a new algorithm, NEOComplex ( NE CC- and O rtholog-based Complex identification by multiple network alignment), which integrates functional orthology information that can be obtained from different types of multiple network alignment (MNA) approaches to expand the search space of protein complex detection. As part of our approach, we also define a new edge clustering coefficient (NECC) to assign weights to interaction edges in PPI networks so that protein complexes can be identified more accurately. The NECC is based on the intuition that there is functional information captured in the common neighbors of the common neighbors as well. Our results show that our algorithm outperforms well-known protein complex identification tools in a balance between precision and recall on three eukaryotic species: human, yeast, and fly. As a result of MNAs of the species, the proposed approach can tolerate edge loss in PPI networks and even discover sparse protein complexes which have traditionally been a challenge to predict.
http://acolab.ie.nthu.edu.tw/bionetwork/NEOComplex.
bab@csail.mit.edu or csliao@ie.nthu.edu.tw.
Supplementary data are available at Bioinformatics online.
蛋白质复合物是研究细胞系统行为的关键之一。许多生物学功能是由蛋白质复合物执行的。在过去十年中,从高通量网络数据中识别蛋白质复合物的主要策略是从单个蛋白质 - 蛋白质相互作用(PPI)网络中提取近似团或高度密集的子图。尽管近年来实验性PPI数据显著增加,但由于高通量实验的局限性,大多数PPI网络仍然存在许多假阳性相互作用和假阴性边丢失的情况。特别是,假阴性错误限制了此类传统蛋白质复合物识别方法的搜索空间。因此,自动识别蛋白质复合物已成为系统生物学中最具挑战性的任务之一。
在本研究中,我们提出了一种新算法NEOComplex(基于多网络比对的基于共表达和直系同源的复合物识别),该算法整合了可从不同类型的多网络比对(MNA)方法中获得的功能直系同源信息,以扩展蛋白质复合物检测的搜索空间。作为我们方法的一部分,我们还定义了一种新的边聚类系数(NECC),为PPI网络中的相互作用边分配权重,以便更准确地识别蛋白质复合物。NECC基于这样一种直觉,即共同邻居的共同邻居中也捕获了功能信息。我们的结果表明,我们的算法在人类、酵母和果蝇这三种真核生物物种上,在精度和召回率之间的平衡方面优于著名的蛋白质复合物识别工具。由于对这些物种进行了多网络比对,所提出的方法可以容忍PPI网络中的边丢失,甚至发现传统上难以预测的稀疏蛋白质复合物。
http://acolab.ie.nthu.edu.tw/bionetwork/NEOComplex。
bab@csail.mit.edu或csliao@ie.nthu.edu.tw。
补充数据可在《生物信息学》在线获取。