Suppr超能文献

多层网络中的模式发现。

Pattern Discovery in Multilayer Networks.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2022 Mar-Apr;19(2):741-752. doi: 10.1109/TCBB.2021.3105001. Epub 2022 Apr 1.

Abstract

MOTIVATION

In bioinformatics, complex cellular modeling and behavior simulation to identify significant molecular interactions is considered a relevant problem. Traditional methods model such complex systems using single and binary network. However, this model is inadequate to represent biological networks as different sets of interactions can simultaneously take place for different interaction constraints (such as transcription regulation and protein interaction). Furthermore, biological systems may exhibit varying interaction topologies even for the same interaction type under different developmental stages or stress conditions. Therefore, models which consider biological systems as solitary interactions are inaccurate as they fail to capture the complex behavior of cellular interactions within organisms. Identification and counting of recurrent motifs within a network is one of the fundamental problems in biological network analysis. Existing methods for motif counting on single network topologies are inadequate to capture patterns of molecular interactions that have significant changes in biological expression when identified across different organisms that are similar, or even time-varying networks within the same organism. That is, they fail to identify recurrent interactions as they consider a single snapshot of a network among a set of multiple networks. Therefore, we need methods geared towards studying multiple network topologies and the pattern conservation among them. Contributions: In this paper, we consider the problem of counting the number of instances of a user supplied motif topology in a given multilayer network. We model interactions among a set of entities (e.g., genes)describing various conditions or temporal variation as multilayer networks. Thus a separate network as each layer shows the connectivity of the nodes under a unique network state. Existing motif counting and identification methods are limited to single network topologies, and thus cannot be directly applied on multilayer networks. We apply our model and algorithm to study frequent patterns in cellular networks that are common in varying cellular states under different stress conditions, where the cellular network topology under each stress condition describes a unique network layer.

RESULTS

We develop a methodology and corresponding algorithm based on the proposed model for motif counting in multilayer networks. We performed experiments on both real and synthetic datasets. We modeled the synthetic datasets under a wide spectrum of parameters, such as network size, density, motif frequency. Results on synthetic datasets demonstrate that our algorithm finds motif embeddings with very high accuracy compared to existing state-of-the-art methods such as G-tries, ESU (FANMODE)and mfinder. Furthermore, we observe that our method runs from several times to several orders of magnitude faster than existing methods. For experiments on real dataset, we consider Escherichia coli (E. coli)transcription regulatory network under different experimental conditions. We observe that the genes selected by our method conserves functional characteristics under various stress conditions with very low false discovery rates. Moreover, the method is scalable to real networks in terms of both network size and number of layers.

摘要

动机

在生物信息学中,对复杂的细胞建模和行为模拟以识别重要的分子相互作用被认为是一个相关的问题。传统的方法使用单一和二进制网络来对这样的复杂系统建模。然而,这种模型不足以表示生物网络,因为不同的相互作用集可以同时发生,具有不同的相互作用约束(如转录调控和蛋白质相互作用)。此外,生物系统在不同的发育阶段或应激条件下,即使对于相同的相互作用类型,也可能表现出不同的相互作用拓扑结构。因此,将生物系统视为单独相互作用的模型是不准确的,因为它们无法捕捉到生物体内部细胞相互作用的复杂行为。在网络中识别和计数重复模式是生物网络分析的基本问题之一。现有的用于单网络拓扑结构的模式计数方法不足以捕获在不同相似的生物体中或甚至在同一生物体中随时间变化的网络中,在生物表达中发生显著变化的分子相互作用模式。也就是说,它们无法识别重复的相互作用,因为它们只考虑了一组多个网络中的单个网络快照。因此,我们需要针对研究多种网络拓扑结构以及它们之间的模式守恒的方法。贡献:在本文中,我们考虑了在给定的多层网络中计算用户提供的模式拓扑实例数的问题。我们将一组实体(例如基因)之间的相互作用建模为描述各种条件或时间变化的多层网络。因此,每个层的单独网络显示了在独特网络状态下节点的连接。现有的模式计数和识别方法仅限于单网络拓扑结构,因此不能直接应用于多层网络。我们应用我们的模型和算法来研究在不同应激条件下不同细胞状态下常见的细胞网络中的频繁模式,其中每个应激条件下的细胞网络拓扑结构描述了一个独特的网络层。结果:我们基于所提出的模型为多层网络中的模式计数开发了一种方法和相应的算法。我们在真实和合成数据集上进行了实验。我们在广泛的参数范围内对合成数据集进行建模,例如网络大小、密度、模式频率。在合成数据集上的结果表明,与现有的最先进的方法(如 G-尝试、ESU(FANMODE)和 mfinder)相比,我们的算法能够非常准确地找到模式嵌入。此外,我们观察到我们的方法比现有的方法快几个数量级。对于真实数据集的实验,我们考虑了不同实验条件下的大肠杆菌(E. coli)转录调控网络。我们观察到,我们的方法选择的基因在各种应激条件下保留了非常低的假发现率的功能特征。此外,该方法在网络大小和层数方面都可扩展到真实网络。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验