IEEE/ACM Trans Comput Biol Bioinform. 2021 May-Jun;18(3):836-849. doi: 10.1109/TCBB.2020.2980260. Epub 2021 Jun 3.
Graph models often give us a deeper understanding of real-world networks. In the case of biological networks they help in predicting the evolution and history of biomolecule interactions, provided we map properly real networks into the corresponding graph models. In this paper, we show that for biological graph models many of the existing parameter estimation techniques overlook the critical property of graph symmetry (also known formally as graph automorphisms), thus the estimated parameters give statistically insignificant results concerning the observed network. To demonstrate it and to develop accurate estimation procedures, we focus on the biologically inspired duplication-divergence model, and the up-to-date data of protein-protein interactions of seven species including human and yeast. Using exact recurrence relations of some prominent graph statistics, we devise a parameter estimation technique that provides the right order of symmetries and uses phylogenetically old proteins as the choice of seed graph nodes. We also find that our results are consistent with the ones obtained from maximum likelihood estimation (MLE). However, the MLE approach is significantly slower than our methods in practice.
图模型通常使我们能够更深入地了解现实世界中的网络。在生物网络的情况下,它们有助于预测生物分子相互作用的演化和历史,前提是我们正确地将真实网络映射到相应的图模型中。在本文中,我们表明,对于生物图模型,许多现有的参数估计技术忽略了图对称性(正式称为图自同构)的关键性质,因此估计的参数在关于观察到的网络的统计上没有意义。为了证明这一点并开发准确的估计程序,我们专注于受生物学启发的复制-分歧模型,以及包括人类和酵母在内的七个物种的蛋白质-蛋白质相互作用的最新数据。使用一些突出的图统计量的精确递归关系,我们设计了一种参数估计技术,该技术提供了正确的对称顺序,并使用系统发生上古老的蛋白质作为种子图节点的选择。我们还发现,我们的结果与从最大似然估计(MLE)获得的结果一致。然而,在实践中,MLE 方法比我们的方法慢得多。