RASMA：一种用于挖掘最大频繁子图的反向搜索算法。

RASMA: a reverse search algorithm for mining maximal frequent subgraphs.

作者信息

Salem Saeed, Alokshiya Mohammed, Hasan Mohammad Al

机构信息

North Dakota State University, Fargo, ND, 58102, USA.

Indiana University-Purdue University Indianapolis, Indianapolis, IN, 46202, USA.

出版信息

BioData Min. 2021 Mar 16;14(1):19. doi: 10.1186/s13040-021-00250-1.

DOI:10.1186/s13040-021-00250-1

PMID:33726790

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7962222/

Abstract

BACKGROUND

Given a collection of coexpression networks over a set of genes, identifying subnetworks that appear frequently is an important research problem known as mining frequent subgraphs. Maximal frequent subgraphs are a representative set of frequent subgraphs; A frequent subgraph is maximal if it does not have a super-graph that is frequent. In the bioinformatics discipline, methodologies for mining frequent and/or maximal frequent subgraphs can be used to discover interesting network motifs that elucidate complex interactions among genes, reflected through the edges of the frequent subnetworks. Further study of frequent coexpression subnetworks enhances the discovery of biological modules and biological signatures for gene expression and disease classification.

RESULTS

We propose a reverse search algorithm, called RASMA, for mining frequent and maximal frequent subgraphs in a given collection of graphs. A key innovation in RASMA is a connected subgraph enumerator that uses a reverse-search strategy to enumerate connected subgraphs of an undirected graph. Using this enumeration strategy, RASMA obtains all maximal frequent subgraphs very efficiently. To overcome the computationally prohibitive task of enumerating all frequent subgraphs while mining for the maximal frequent subgraphs, RASMA employs several pruning strategies that substantially improve its overall runtime performance. Experimental results show that on large gene coexpression networks, the proposed algorithm efficiently mines biologically relevant maximal frequent subgraphs.

CONCLUSION

Extracting recurrent gene coexpression subnetworks from multiple gene expression experiments enables the discovery of functional modules and subnetwork biomarkers. We have proposed a reverse search algorithm for mining maximal frequent subnetworks. Enrichment analysis of the extracted maximal frequent subnetworks reveals that subnetworks that are frequent are highly enriched with known biological ontologies.

摘要

背景

给定一组基因上的共表达网络集合，识别频繁出现的子网络是一个重要的研究问题，即挖掘频繁子图。最大频繁子图是频繁子图的一个代表性集合；如果一个频繁子图没有频繁的超图，那么它就是最大的。在生物信息学领域，挖掘频繁和/或最大频繁子图的方法可用于发现有趣的网络基序，这些基序通过频繁子网络的边反映基因之间的复杂相互作用。对频繁共表达子网络的进一步研究有助于发现基因表达和疾病分类的生物模块和生物特征。

结果

我们提出了一种反向搜索算法，称为RASMA，用于在给定的图集合中挖掘频繁和最大频繁子图。RASMA的一个关键创新是一个连通子图枚举器，它使用反向搜索策略来枚举无向图的连通子图。使用这种枚举策略，RASMA能够非常高效地获得所有最大频繁子图。为了克服在挖掘最大频繁子图时枚举所有频繁子图的计算量过大的任务，RASMA采用了几种剪枝策略，显著提高了其整体运行时性能。实验结果表明，在大型基因共表达网络上，该算法能够有效地挖掘出与生物学相关的最大频繁子图。

结论

从多个基因表达实验中提取反复出现的基因共表达子网络，有助于发现功能模块和子网络生物标志物。我们提出了一种用于挖掘最大频繁子网络的反向搜索算法。对提取的最大频繁子网络的富集分析表明，频繁出现的子网络高度富集了已知的生物学本体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a46b/7962222/59c4afce713f/13040_2021_250_Fig1_HTML.jpg

相似文献

RASMA: a reverse search algorithm for mining maximal frequent subgraphs.RASMA：一种用于挖掘最大频繁子图的反向搜索算法。

BioData Min. 2021 Mar 16;14(1):19. doi: 10.1186/s13040-021-00250-1.

A linear delay algorithm for enumerating all connected induced subgraphs.一种用于枚举所有连通诱导子图的线性延迟算法。

BMC Bioinformatics. 2019 Jun 20;20(Suppl 12):319. doi: 10.1186/s12859-019-2837-y.

Mining the Enriched Subgraphs for Specific Vertices in a Biological Graph.从生物图谱中特定顶点的富集子图中挖掘信息。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Sep-Oct;16(5):1496-1507. doi: 10.1109/TCBB.2016.2576440. Epub 2016 Jun 7.

A Two-Phase Algorithm for Differentially Private Frequent Subgraph Mining.一种用于差分隐私频繁子图挖掘的两阶段算法。

IEEE Trans Knowl Data Eng. 2018 Aug 1;30(8):1411-1425. doi: 10.1109/tkde.2018.2793862. Epub 2018 Jan 15.

An novel frequent probability pattern mining algorithm based on circuit simulation method in uncertain biological networks.一种基于不确定生物网络中电路仿真方法的新型频繁概率模式挖掘算法。

BMC Syst Biol. 2014;8 Suppl 3(Suppl 3):S6. doi: 10.1186/1752-0509-8-S3-S6. Epub 2014 Oct 22.

Grasping frequent subgraph mining for bioinformatics applications.用于生物信息学应用的频繁子图挖掘

BioData Min. 2018 Sep 3;11:20. doi: 10.1186/s13040-018-0181-9. eCollection 2018.

Differentially Private Frequent Subgraph Mining.差分隐私频繁子图挖掘

Proc Int Conf Data Eng. 2016 May;2016:229-240. doi: 10.1109/ICDE.2016.7498243. Epub 2016 Jun 23.

Detection of Complexes in Biological Networks Through Diversified Dense Subgraph Mining.通过多样化密集子图挖掘检测生物网络中的复合物

J Comput Biol. 2017 Sep;24(9):923-941. doi: 10.1089/cmb.2017.0037. Epub 2017 Jun 1.

Coupling Graphs, Efficient Algorithms and B-Cell Epitope Prediction.耦合图、高效算法与B细胞表位预测

IEEE/ACM Trans Comput Biol Bioinform. 2014 Jan-Feb;11(1):7-16. doi: 10.1109/TCBB.2013.136.

MISAGA: An Algorithm for Mining Interesting Subgraphs in Attributed Graphs.MISAGA：属性图中有趣子图挖掘的算法。

IEEE Trans Cybern. 2018 May;48(5):1369-1382. doi: 10.1109/TCYB.2017.2693558. Epub 2017 Apr 25.

本文引用的文献

A linear delay algorithm for enumerating all connected induced subgraphs.一种用于枚举所有连通诱导子图的线性延迟算法。

BMC Bioinformatics. 2019 Jun 20;20(Suppl 12):319. doi: 10.1186/s12859-019-2837-y.

Application of Weighted Gene Co-expression Network Analysis for Data from Paired Design.加权基因共表达网络分析在配对设计数据中的应用。

Sci Rep. 2018 Jan 12;8(1):622. doi: 10.1038/s41598-017-18705-z.

The Molecular Signatures Database (MSigDB) hallmark gene set collection.分子特征数据库（MSigDB）标志性基因集集合。

Cell Syst. 2015 Dec 23;1(6):417-425. doi: 10.1016/j.cels.2015.12.004.

RB1 dual role in proliferation and apoptosis: cell fate control and implications for cancer therapy.RB1在增殖和凋亡中的双重作用：细胞命运控制及其对癌症治疗的意义。

Oncotarget. 2015 Jul 20;6(20):17873-90. doi: 10.18632/oncotarget.4286.

Sharing and Specificity of Co-expression Networks across 35 Human Tissues.35种人体组织中共表达网络的共享性与特异性

PLoS Comput Biol. 2015 May 13;11(5):e1004220. doi: 10.1371/journal.pcbi.1004220. eCollection 2015 May.

Direct involvement of retinoblastoma family proteins in DNA repair by non-homologous end-joining.视网膜母细胞瘤家族蛋白通过非同源末端连接直接参与DNA修复。

Cell Rep. 2015 Mar 31;10(12):2006-18. doi: 10.1016/j.celrep.2015.02.059. Epub 2015 Mar 26.

Integrative analysis of many weighted co-expression networks using tensor computation.基于张量计算的多种加权共表达网络的综合分析。

PLoS Comput Biol. 2011 Jun;7(6):e1001106. doi: 10.1371/journal.pcbi.1001106. Epub 2011 Jun 16.

Kavosh: a new algorithm for finding network motifs.卡沃什：一种用于发现网络基元的新算法。

BMC Bioinformatics. 2009 Oct 4;10:318. doi: 10.1186/1471-2105-10-318.

Network-based classification of breast cancer metastasis.基于网络的乳腺癌转移分类

Mol Syst Biol. 2007;3:140. doi: 10.1038/msb4100180. Epub 2007 Oct 16.

Gene profiling approaches help to define the specific functions of retinoblastoma family in epidermis.基因谱分析方法有助于明确视网膜母细胞瘤家族在表皮中的特定功能。

Mol Carcinog. 2008 Mar;47(3):209-21. doi: 10.1002/mc.20376.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

RASMA：一种用于挖掘最大频繁子图的反向搜索算法。

RASMA: a reverse search algorithm for mining maximal frequent subgraphs.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献