Suppr超能文献

大型稀疏二分图的(p,q)双团计数与枚举

(p,q)-biclique counting and enumeration for large sparse bipartite graphs.

作者信息

Yang Jianye, Peng Yun, Ouyang Dian, Zhang Wenjie, Lin Xuemin, Zhao Xiang

机构信息

Guangzhou University, Guangzhou, China.

The University of New South Wales, Sydney, Australia.

出版信息

VLDB J. 2023 Mar 13:1-25. doi: 10.1007/s00778-023-00786-0.

Abstract

In this paper, we study the problem of (, )-biclique counting and enumeration for large sparse bipartite graphs. Given a bipartite graph and two integer parameters and , we aim to efficiently count and enumerate all (, )-bicliques in , where a (, )-biclique (, ) is a complete subgraph of with , , , and . The problem of (, )-biclique counting and enumeration has many applications, such as graph neural network information aggregation, densest subgraph detection, and cohesive subgroup analysis. Despite the wide range of applications, to the best of our knowledge, we note that there is no efficient and scalable solution to this problem in the literature . This problem is computationally challenging, due to the worst-case exponential number of (, )-bicliques. In this paper, we propose a competitive branch-and-bound baseline method, namely BCList, which explores the search space in a depth-first manner, together with a variety of pruning techniques. Although BCList offers a useful computation framework to our problem, its worst-case time complexity is exponential to . To alleviate this, we propose an advanced approach, called BCList++. Particularly, BCList++ applies a layer-based exploring strategy to enumerate (, )-bicliques by anchoring the search on either or only, which has a worst-case time complexity exponential to either or only. Consequently, a vital task is to choose a layer with the least computation cost. To this end, we develop a cost model, which is built upon an unbiased estimator for the density of 2-hop graph induced by or . To improve computation efficiency, BCList++ exploits pre-allocated arrays and vertex labeling techniques such that the frequent subgraph creating operations can be substituted by array element switching operations. We conduct extensive experiments on 16 real-life datasets, and the experimental results demonstrate that BCList++ significantly outperforms the baseline methods by up to 3 orders of magnitude. We show via a case study that (, )-bicliques optimizes the efficiency of graph neural networks. In this paper, we extend our techniques to count and enumerate (, )-bicliques on uncertain bipartite graphs. An efficient method IUBCList is developed on the top of BCList++, together with a couple of pruning techniques, including common neighbor refinement and search branch early termination, to discard unpromising uncertain (, )-bicliques early. The experimental results demonstrate that IUBCList significantly outperforms the baseline method by up to 2 orders of magnitude.

摘要

在本文中,我们研究了大型稀疏二分图的(, )-双团计数与枚举问题。给定一个二分图以及两个整数参数 和 ,我们旨在高效地计数并枚举图 中所有的(, )-双团,其中(, )-双团(, )是图 的一个完全子图,满足 , , , 。(, )-双团计数与枚举问题有许多应用,如图神经网络信息聚合、最密集子图检测和凝聚子群分析。尽管应用广泛,但据我们所知,文献中尚无针对此问题的高效且可扩展的解决方案。由于(, )-双团的数量在最坏情况下呈指数级增长,该问题在计算上具有挑战性。在本文中,我们提出了一种具有竞争力的分支定界基线方法,即BCList,它以深度优先的方式探索搜索空间,并结合了多种剪枝技术。尽管BCList为我们的问题提供了一个有用的计算框架,但其最坏情况下的时间复杂度与 呈指数关系。为了缓解这一问题,我们提出了一种先进的方法,称为BCList++。具体而言,BCList++应用基于层的探索策略,通过仅将搜索锚定在 或 上来枚举(, )-双团,其最坏情况下的时间复杂度仅与 或 呈指数关系。因此,一项关键任务是选择计算成本最低的层。为此,我们开发了一个成本模型,该模型基于对由 或 诱导的2-跳图密度的无偏估计器构建。为了提高计算效率,BCList++利用预分配数组和顶点标记技术,使得频繁的子图创建操作可以被数组元素切换操作所替代。我们在16个真实数据集上进行了广泛的实验,实验结果表明BCList++比基线方法性能显著提升,高达3个数量级。通过一个案例研究,我们表明(, )-双团优化了图神经网络的效率。在本文中,我们将我们的技术扩展到对不确定二分图上的(, )-双团进行计数和枚举。在BCList++的基础上开发了一种高效方法IUBCList,并结合了一些剪枝技术,包括共同邻居细化和搜索分支提前终止,以尽早舍弃没有前景的不确定(, )-双团。实验结果表明,IUBCList比基线方法性能显著提升,高达2个数量级。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cab/10008723/ffabddbd8187/778_2023_786_Fig5_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验