Computer and Information Science and Engineering, University of Florida, Gainesville, 32611, FL, USA.
Departments of Biology and Mathematics, Colgate University, Hamilton, 13346, NY, USA.
BMC Bioinformatics. 2019 Jun 20;20(Suppl 12):318. doi: 10.1186/s12859-019-2838-x.
Identification of motifs-recurrent and statistically significant patterns-in biological networks is the key to understand the design principles, and to infer governing mechanisms of biological systems. This, however, is a computationally challenging task. This task is further complicated as biological interactions depend on limited resources, i.e., a reaction takes place if the reactant molecule concentrations are above a certain threshold level. This biochemical property implies that network edges can participate in a limited number of motifs simultaneously. Existing motif counting methods ignore this problem. This simplification often leads to inaccurate motif counts (over- or under-estimates), and thus, wrong biological interpretations.
In this paper, we develop a novel motif counting algorithm, Partially Overlapping MOtif Counting (POMOC), that considers capacity levels for all interactions in counting motifs.
Our experiments on real and synthetic networks demonstrate that motif count using the POMOC method significantly differs from the existing motif counting approaches, and our method extends to large-scale biological networks in practical time. Our results also show that our method makes it possible to characterize the impact of different stress factors on cell's organization of network. In this regard, analysis of a S. cerevisiae transcriptional regulatory network using our method shows that oxidative stress is more disruptive to organization and abundance of motifs in this network than mutations of individual genes. Our analysis also suggests that by focusing on the edges that lead to variation in motif counts, our method can be used to find important genes, and to reveal subtle topological and functional differences of the biological networks under different cell states.
在生物网络中识别基序(重复出现且具有统计学意义的模式)是理解设计原则和推断生物系统控制机制的关键。然而,这是一项具有挑战性的计算任务。由于生物相互作用依赖于有限的资源,即只有当反应物分子浓度超过一定的阈值水平时,反应才会发生,因此这项任务变得更加复杂。这种生化特性意味着网络边缘可以同时参与有限数量的基序。现有的基序计数方法忽略了这个问题。这种简化通常会导致基序计数不准确(高估或低估),从而导致错误的生物学解释。
在本文中,我们开发了一种新的基序计数算法,部分重叠基序计数(POMOC),该算法在计数基序时考虑了所有相互作用的容量水平。
我们在真实和合成网络上的实验表明,使用 POMOC 方法进行基序计数与现有的基序计数方法有显著差异,并且我们的方法可以在实际时间内扩展到大规模的生物网络。我们的结果还表明,我们的方法可以用于表征不同应激因素对细胞网络组织的影响。在这方面,使用我们的方法对酿酒酵母转录调控网络进行的分析表明,与单个基因突变相比,氧化应激对该网络中基序的组织和丰度的破坏性更大。我们的分析还表明,通过关注导致基序计数变化的边缘,我们的方法可以用于找到重要的基因,并揭示不同细胞状态下生物网络的微妙拓扑和功能差异。