Computer Science, North Dakota State University, Fargo, North Dakota, USA.
Computer Science and Engineering, Qatar University, Doha, Qatar.
BMC Bioinformatics. 2024 Nov 14;25(1):356. doi: 10.1186/s12859-024-05960-x.
Networks have emerged as a natural data structure to represent relations among entities. Proteins interact to carry out cellular functions and protein-Protein interaction network analysis has been employed for understanding the cellular machinery. Advances in genomics technologies enabled the collection of large data that annotate proteins in interaction networks. Integrative analysis of interaction networks with gene expression and annotations enables the discovery of context-specific complexes and improves the identification of functional modules and pathways. Extracting subnetworks whose vertices are connected and have high attribute similarity have applications in diverse domains. We present an enumeration approach for mining sets of connected and cohesive subgraphs, where vertices in the subgraphs have similar attribute profile. Due to the large number of cohesive connected subgraphs and to overcome the overlap among these subgraphs, we propose an algorithm for enumerating a set of representative subgraphs, the set of all closed subgraphs. We propose pruning strategies for efficiently enumerating the search tree without missing any pattern or reporting duplicate subgraphs. On a real protein-protein interaction network with attributes representing the dysregulation profile of genes in multiple cancers, we mine closed cohesive connected subnetworks and show their biological significance. Moreover, we conduct a runtime comparison with existing algorithms to show the efficiency of our proposed algorithm.
网络已经成为表示实体之间关系的自然数据结构。蛋白质相互作用以执行细胞功能,蛋白质-蛋白质相互作用网络分析已被用于理解细胞机制。基因组学技术的进步使得能够收集大量注释蛋白质相互作用网络的数据集。将相互作用网络与基因表达和注释进行综合分析,可以发现特定于上下文的复合物,并提高功能模块和途径的识别能力。提取顶点连接且具有高属性相似性的子网在不同领域都有应用。我们提出了一种用于挖掘一组连接和凝聚子图的枚举方法,其中子图中的顶点具有相似的属性分布。由于凝聚连接子图的数量众多,为了克服这些子图之间的重叠,我们提出了一种用于枚举一组代表子图的算法,即所有闭合子图的集合。我们提出了剪枝策略,以便在不遗漏任何模式或报告重复子图的情况下有效地枚举搜索树。在具有表示多种癌症中基因失调特征的属性的真实蛋白质-蛋白质相互作用网络上,我们挖掘了封闭凝聚的连通子网,并展示了它们的生物学意义。此外,我们还与现有算法进行了运行时比较,以展示我们提出的算法的效率。