Suppr超能文献

RASMA:一种用于挖掘最大频繁子图的反向搜索算法。

RASMA: a reverse search algorithm for mining maximal frequent subgraphs.

作者信息

Salem Saeed, Alokshiya Mohammed, Hasan Mohammad Al

机构信息

North Dakota State University, Fargo, ND, 58102, USA.

Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202, USA.

出版信息

BioData Min. 2021 Mar 16;14(1):19. doi: 10.1186/s13040-021-00250-1.

Abstract

BACKGROUND

Given a collection of coexpression networks over a set of genes, identifying subnetworks that appear frequently is an important research problem known as mining frequent subgraphs. Maximal frequent subgraphs are a representative set of frequent subgraphs; A frequent subgraph is maximal if it does not have a super-graph that is frequent. In the bioinformatics discipline, methodologies for mining frequent and/or maximal frequent subgraphs can be used to discover interesting network motifs that elucidate complex interactions among genes, reflected through the edges of the frequent subnetworks. Further study of frequent coexpression subnetworks enhances the discovery of biological modules and biological signatures for gene expression and disease classification.

RESULTS

We propose a reverse search algorithm, called RASMA, for mining frequent and maximal frequent subgraphs in a given collection of graphs. A key innovation in RASMA is a connected subgraph enumerator that uses a reverse-search strategy to enumerate connected subgraphs of an undirected graph. Using this enumeration strategy, RASMA obtains all maximal frequent subgraphs very efficiently. To overcome the computationally prohibitive task of enumerating all frequent subgraphs while mining for the maximal frequent subgraphs, RASMA employs several pruning strategies that substantially improve its overall runtime performance. Experimental results show that on large gene coexpression networks, the proposed algorithm efficiently mines biologically relevant maximal frequent subgraphs.

CONCLUSION

Extracting recurrent gene coexpression subnetworks from multiple gene expression experiments enables the discovery of functional modules and subnetwork biomarkers. We have proposed a reverse search algorithm for mining maximal frequent subnetworks. Enrichment analysis of the extracted maximal frequent subnetworks reveals that subnetworks that are frequent are highly enriched with known biological ontologies.

摘要

背景

给定一组基因上的共表达网络集合,识别频繁出现的子网络是一个重要的研究问题,即挖掘频繁子图。最大频繁子图是频繁子图的一个代表性集合;如果一个频繁子图没有频繁的超图,那么它就是最大的。在生物信息学领域,挖掘频繁和/或最大频繁子图的方法可用于发现有趣的网络基序,这些基序通过频繁子网络的边反映基因之间的复杂相互作用。对频繁共表达子网络的进一步研究有助于发现基因表达和疾病分类的生物模块和生物特征。

结果

我们提出了一种反向搜索算法,称为RASMA,用于在给定的图集合中挖掘频繁和最大频繁子图。RASMA的一个关键创新是一个连通子图枚举器,它使用反向搜索策略来枚举无向图的连通子图。使用这种枚举策略,RASMA能够非常高效地获得所有最大频繁子图。为了克服在挖掘最大频繁子图时枚举所有频繁子图的计算量过大的任务,RASMA采用了几种剪枝策略,显著提高了其整体运行时性能。实验结果表明,在大型基因共表达网络上,该算法能够有效地挖掘出与生物学相关的最大频繁子图。

结论

从多个基因表达实验中提取反复出现的基因共表达子网络,有助于发现功能模块和子网络生物标志物。我们提出了一种用于挖掘最大频繁子网络的反向搜索算法。对提取的最大频繁子网络的富集分析表明,频繁出现的子网络高度富集了已知的生物学本体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a46b/7962222/59c4afce713f/13040_2021_250_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验