• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大型稀疏二分图的(p,q)双团计数与枚举

(p,q)-biclique counting and enumeration for large sparse bipartite graphs.

作者信息

Yang Jianye, Peng Yun, Ouyang Dian, Zhang Wenjie, Lin Xuemin, Zhao Xiang

机构信息

Guangzhou University, Guangzhou, China.

The University of New South Wales, Sydney, Australia.

出版信息

VLDB J. 2023 Mar 13:1-25. doi: 10.1007/s00778-023-00786-0.

DOI:10.1007/s00778-023-00786-0
PMID:37362202
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10008723/
Abstract

In this paper, we study the problem of (, )-biclique counting and enumeration for large sparse bipartite graphs. Given a bipartite graph and two integer parameters and , we aim to efficiently count and enumerate all (, )-bicliques in , where a (, )-biclique (, ) is a complete subgraph of with , , , and . The problem of (, )-biclique counting and enumeration has many applications, such as graph neural network information aggregation, densest subgraph detection, and cohesive subgroup analysis. Despite the wide range of applications, to the best of our knowledge, we note that there is no efficient and scalable solution to this problem in the literature . This problem is computationally challenging, due to the worst-case exponential number of (, )-bicliques. In this paper, we propose a competitive branch-and-bound baseline method, namely BCList, which explores the search space in a depth-first manner, together with a variety of pruning techniques. Although BCList offers a useful computation framework to our problem, its worst-case time complexity is exponential to . To alleviate this, we propose an advanced approach, called BCList++. Particularly, BCList++ applies a layer-based exploring strategy to enumerate (, )-bicliques by anchoring the search on either or only, which has a worst-case time complexity exponential to either or only. Consequently, a vital task is to choose a layer with the least computation cost. To this end, we develop a cost model, which is built upon an unbiased estimator for the density of 2-hop graph induced by or . To improve computation efficiency, BCList++ exploits pre-allocated arrays and vertex labeling techniques such that the frequent subgraph creating operations can be substituted by array element switching operations. We conduct extensive experiments on 16 real-life datasets, and the experimental results demonstrate that BCList++ significantly outperforms the baseline methods by up to 3 orders of magnitude. We show via a case study that (, )-bicliques optimizes the efficiency of graph neural networks. In this paper, we extend our techniques to count and enumerate (, )-bicliques on uncertain bipartite graphs. An efficient method IUBCList is developed on the top of BCList++, together with a couple of pruning techniques, including common neighbor refinement and search branch early termination, to discard unpromising uncertain (, )-bicliques early. The experimental results demonstrate that IUBCList significantly outperforms the baseline method by up to 2 orders of magnitude.

摘要

在本文中,我们研究了大型稀疏二分图的(, )-双团计数与枚举问题。给定一个二分图以及两个整数参数 和 ,我们旨在高效地计数并枚举图 中所有的(, )-双团,其中(, )-双团(, )是图 的一个完全子图,满足 , , , 。(, )-双团计数与枚举问题有许多应用,如图神经网络信息聚合、最密集子图检测和凝聚子群分析。尽管应用广泛,但据我们所知,文献中尚无针对此问题的高效且可扩展的解决方案。由于(, )-双团的数量在最坏情况下呈指数级增长,该问题在计算上具有挑战性。在本文中,我们提出了一种具有竞争力的分支定界基线方法,即BCList,它以深度优先的方式探索搜索空间,并结合了多种剪枝技术。尽管BCList为我们的问题提供了一个有用的计算框架,但其最坏情况下的时间复杂度与 呈指数关系。为了缓解这一问题,我们提出了一种先进的方法,称为BCList++。具体而言,BCList++应用基于层的探索策略,通过仅将搜索锚定在 或 上来枚举(, )-双团,其最坏情况下的时间复杂度仅与 或 呈指数关系。因此,一项关键任务是选择计算成本最低的层。为此,我们开发了一个成本模型,该模型基于对由 或 诱导的2-跳图密度的无偏估计器构建。为了提高计算效率,BCList++利用预分配数组和顶点标记技术,使得频繁的子图创建操作可以被数组元素切换操作所替代。我们在16个真实数据集上进行了广泛的实验,实验结果表明BCList++比基线方法性能显著提升,高达3个数量级。通过一个案例研究,我们表明(, )-双团优化了图神经网络的效率。在本文中,我们将我们的技术扩展到对不确定二分图上的(, )-双团进行计数和枚举。在BCList++的基础上开发了一种高效方法IUBCList,并结合了一些剪枝技术,包括共同邻居细化和搜索分支提前终止,以尽早舍弃没有前景的不确定(, )-双团。实验结果表明,IUBCList比基线方法性能显著提升,高达2个数量级。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cab/10008723/ffabddbd8187/778_2023_786_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cab/10008723/ffabddbd8187/778_2023_786_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cab/10008723/ffabddbd8187/778_2023_786_Fig5_HTML.jpg

相似文献

1
(p,q)-biclique counting and enumeration for large sparse bipartite graphs.大型稀疏二分图的(p,q)双团计数与枚举
VLDB J. 2023 Mar 13:1-25. doi: 10.1007/s00778-023-00786-0.
2
Biclique: an R package for maximal biclique enumeration in bipartite graphs.双团:用于二分图中最大双团枚举的R语言包。
BMC Res Notes. 2020 Feb 21;13(1):88. doi: 10.1186/s13104-020-04955-0.
3
A Biclique Approach to Reference-Anchored Gene Blocks and Its Applications to Genomic Islands.一种用于参考锚定基因块的双簇方法及其在基因组岛中的应用。
J Comput Biol. 2018 Feb;25(2):214-235. doi: 10.1089/cmb.2017.0108. Epub 2017 Oct 13.
4
On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types.在二分图中寻找双团块:一种新算法及其在整合多种生物数据类型中的应用。
BMC Bioinformatics. 2014 Apr 15;15:110. doi: 10.1186/1471-2105-15-110.
5
Biclique communities.双团社区
Phys Rev E Stat Nonlin Soft Matter Phys. 2008 Jul;78(1 Pt 2):016108. doi: 10.1103/PhysRevE.78.016108. Epub 2008 Jul 21.
6
Erratum: Eyestalk Ablation to Increase Ovarian Maturation in Mud Crabs.勘误:切除眼柄以增加泥蟹的卵巢成熟度。
J Vis Exp. 2023 May 26(195). doi: 10.3791/6561.
7
A linear delay algorithm for enumerating all connected induced subgraphs.一种用于枚举所有连通诱导子图的线性延迟算法。
BMC Bioinformatics. 2019 Jun 20;20(Suppl 12):319. doi: 10.1186/s12859-019-2837-y.
8
Faster algorithms for counting subgraphs in sparse graphs.用于稀疏图中子图计数的更快算法。
Algorithmica. 2021;83(8):2578-2605. doi: 10.1007/s00453-021-00811-0. Epub 2021 Feb 22.
9
Discriminative Feature Selection for Uncertain Graph Classification.用于不确定图分类的判别特征选择
Proc SIAM Int Conf Data Min. 2013;2013:82-93. doi: 10.1137/1.9781611972832.10.
10
Protein binding hot spots and the residue-residue pairing preference: a water exclusion perspective.蛋白质结合热点和残基-残基配对偏好:一种水排斥观点。
BMC Bioinformatics. 2010 May 12;11:244. doi: 10.1186/1471-2105-11-244.

本文引用的文献

1
Higher-order organization of complex networks.复杂网络的高阶组织
Science. 2016 Jul 8;353(6295):163-6. doi: 10.1126/science.aad9029.
2
On finding bicliques in bipartite graphs: a novel algorithm and its application to the integration of diverse biological data types.在二分图中寻找双团块:一种新算法及其在整合多种生物数据类型中的应用。
BMC Bioinformatics. 2014 Apr 15;15:110. doi: 10.1186/1471-2105-15-110.
3
Biclustering algorithms for biological data analysis: a survey.用于生物数据分析的双聚类算法:一项综述。
IEEE/ACM Trans Comput Biol Bioinform. 2004 Jan-Mar;1(1):24-45. doi: 10.1109/TCBB.2004.2.
4
Cycles and clustering in bipartite networks.二分网络中的循环与聚类
Phys Rev E Stat Nonlin Soft Matter Phys. 2005 Nov;72(5 Pt 2):056127. doi: 10.1103/PhysRevE.72.056127. Epub 2005 Nov 22.
5
Uncovering the overlapping community structure of complex networks in nature and society.揭示自然与社会中复杂网络的重叠群落结构。
Nature. 2005 Jun 9;435(7043):814-8. doi: 10.1038/nature03607.
6
Obtaining maximal concatenated phylogenetic data sets from large sequence databases.从大型序列数据库中获取最大串联系统发育数据集。
Mol Biol Evol. 2003 Jul;20(7):1036-42. doi: 10.1093/molbev/msg115. Epub 2003 May 30.
7
Biclustering of expression data.表达数据的双聚类分析
Proc Int Conf Intell Syst Mol Biol. 2000;8:93-103.