Department of Information Systems and Analytics, School of Computing, National University of Singapore, Singapore 117417, Singapore.
School of Computational Science and Engineering, College of Computing, Georgia Institute of Technology, Atlanta 30308, GA, USA.
Bioinformatics. 2021 May 23;37(8):1083-1092. doi: 10.1093/bioinformatics/btaa931.
The study of the evolutionary history of biological networks enables deep functional understanding of various bio-molecular processes. Network growth models, such as the Duplication-Mutation with Complementarity (DMC) model, provide a principled approach to characterizing the evolution of protein-protein interactions (PPIs) based on duplication and divergence. Current methods for model-based ancestral network reconstruction primarily use greedy heuristics and yield sub-optimal solutions.
We present a new Integer Linear Programming (ILP) solution for maximum likelihood reconstruction of ancestral PPI networks using the DMC model. We prove the correctness of our solution that is designed to find the optimal solution. It can also use efficient heuristics from general-purpose ILP solvers to obtain multiple optimal and near-optimal solutions that may be useful in many applications. Experiments on synthetic data show that our ILP obtains solutions with higher likelihood than those from previous methods, and is robust to noise and model mismatch. We evaluate our algorithm on two real PPI networks, with proteins from the families of bZIP transcription factors and the Commander complex. On both the networks, solutions from our ILP have higher likelihood and are in better agreement with independent biological evidence from other studies.
A Python implementation is available at https://bitbucket.org/cdal/network-reconstruction.
Supplementary data are available at Bioinformatics online.
研究生物网络的进化历史能够深入理解各种生物分子过程的功能。网络增长模型,如 Duplication-Mutation with Complementarity(DMC)模型,为基于复制和分歧来刻画蛋白质-蛋白质相互作用(PPIs)的进化提供了一种原则性的方法。基于模型的祖先网络重建的当前方法主要使用贪婪启发式算法,得到的是次优解。
我们提出了一种新的整数线性规划(ILP)解决方案,用于使用 DMC 模型进行祖先 PPI 网络的最大似然重建。我们证明了我们的解决方案的正确性,该解决方案旨在找到最优解。它还可以使用通用 ILP 求解器的有效启发式算法来获得多个最优和近最优的解决方案,这些解决方案在许多应用中可能很有用。在合成数据上的实验表明,我们的 ILP 获得的解比以前的方法具有更高的似然度,并且对噪声和模型不匹配具有鲁棒性。我们在两个真实的 PPI 网络上评估了我们的算法,这些网络的蛋白质来自 bZIP 转录因子家族和 Commander 复合物。在这两个网络上,我们的 ILP 得到的解具有更高的似然度,并且与来自其他研究的独立生物学证据更一致。
一个 Python 实现可在 https://bitbucket.org/cdal/network-reconstruction 获得。
补充数据可在生物信息学在线获得。