IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1049-1062. doi: 10.1109/TCBB.2018.2821666. Epub 2018 Apr 2.
Biological networks provide great potential to understand how cells function. Motifs are topological patterns which are repeated frequently in a specific network. Network motifs are key structures through which biological networks operate. However, counting independent (i.e., non-overlapping) instances of a specific motif remains to be a computationally hard problem. Motif counting problem becomes computationally even harder for biological networks as biological interactions are uncertain events. The main challenge behind this problem is that different embeddings of a given motif in a network can share edges. Such edges can create complex computational dependencies between different instances of the given motif when considering uncertainty of those edges. In this paper, we develop a novel algorithm for counting independent instances of a specific motif topology in probabilistic biological networks. We present a novel mathematical model to capture the dependency between each embedding and all the other embeddings, which it overlaps with. We prove the correctness of this model. We evaluate our model on real and synthetic networks with different probability, and topology models as well as reasonable range of network sizes. Our results demonstrate that our method counts non-overlapping embeddings in practical time for a broad range of networks.
生物网络为理解细胞功能提供了巨大的潜力。模体是在特定网络中频繁重复的拓扑模式。网络模体是生物网络运行的关键结构。然而,对于特定模体的独立(即非重叠)实例的计数仍然是一个计算上的难题。对于生物网络来说,由于生物相互作用是不确定的事件,因此 motif 计数问题在计算上变得更加困难。这个问题的主要挑战在于,给定 motif 在网络中的不同嵌入可能共享边。当考虑这些边的不确定性时,这种边可以在给定 motif 的不同实例之间创建复杂的计算依赖关系。在本文中,我们开发了一种在概率生物网络中计算特定 motif 拓扑独立实例的新算法。我们提出了一种新的数学模型来捕获每个嵌入与其重叠的所有其他嵌入之间的依赖关系。我们证明了这个模型的正确性。我们在不同概率、拓扑模型以及合理的网络规模范围内的真实和合成网络上评估了我们的模型。我们的结果表明,我们的方法可以在实际时间内对广泛的网络计算非重叠的嵌入。