Mukherjee Kingshuk, Hasan Md Mahmudul, Boucher Christina, Kahveci Tamer
Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USA.
BMC Syst Biol. 2018 Apr 11;12(Suppl 1):6. doi: 10.1186/s12918-018-0533-6.
A network motif is a sub-network that occurs frequently in a given network. Detection of such motifs is important since they uncover functions and local properties of the given biological network. Finding motifs is however a computationally challenging task as it requires solving the costly subgraph isomorphism problem. Moreover, the topology of biological networks change over time. These changing networks are called dynamic biological networks. As the network evolves, frequency of each motif in the network also changes. Computing the frequency of a given motif from scratch in a dynamic network as the network topology evolves is infeasible, particularly for large and fast evolving networks.
In this article, we design and develop a scalable method for counting the number of motifs in a dynamic biological network. Our method incrementally updates the frequency of each motif as the underlying network's topology evolves. Our experiments demonstrate that our method can update the frequency of each motif in orders of magnitude faster than counting the motif embeddings every time the network changes. If the network evolves more frequently, the margin with which our method outperforms the existing static methods, increases.
We evaluated our method extensively using synthetic and real datasets, and show that our method is highly accurate(≥ 96%) and that it can be scaled to large dense networks. The results on real data demonstrate the utility of our method in revealing interesting insights on the evolution of biological processes.
网络基序是在给定网络中频繁出现的子网。检测此类基序很重要,因为它们揭示了给定生物网络的功能和局部特性。然而,寻找基序是一项计算上具有挑战性的任务,因为它需要解决代价高昂的子图同构问题。此外,生物网络的拓扑结构会随时间变化。这些不断变化的网络称为动态生物网络。随着网络的演化,网络中每个基序的频率也会改变。在动态网络中,随着网络拓扑的演化,从头计算给定基序的频率是不可行的,特别是对于大型且快速演化的网络。
在本文中,我们设计并开发了一种可扩展的方法来计算动态生物网络中基序的数量。我们的方法随着基础网络拓扑的演化逐步更新每个基序的频率。我们的实验表明,我们的方法更新每个基序频率的速度比每次网络变化时都重新计算基序嵌入要快几个数量级。如果网络演化更频繁,我们的方法相对于现有静态方法的优势就会增加。
我们使用合成数据集和真实数据集对我们的方法进行了广泛评估,结果表明我们的方法具有高度准确性(≥96%),并且可以扩展到大型密集网络。真实数据的结果证明了我们的方法在揭示生物过程演化方面有趣见解的实用性。