Hussain Md Taufique, Halappanavar Mahantesh, Chatterjee Samrat, Radicchi Filippo, Fortunato Santo, Azad Ariful
Department of Intelligent Systems Engineering, Indiana University, Bloomington, IN, USA.
Data Sciences and Machine Intelligence Group, Pacific Northwest National Laboratory, Richland, WA, USA.
Sci Rep. 2025 Jan 30;15(1):3788. doi: 10.1038/s41598-025-87479-6.
We develop an algorithm that finds the consensus among many different clustering solutions of a graph. We formulate the problem as a median set partitioning problem and propose a greedy optimization technique. Unlike other approaches that find median set partitions, our algorithm takes graph structure into account and finds a comparable quality solution much faster than the other approaches. For graphs with known communities, our consensus partition captures the actual community structure more accurately than alternative approaches. To make it applicable to large graphs, we remove sequential dependencies from our algorithm and design a parallel algorithm. Our parallel algorithm achieves 35x speedup when utilizing 64 processing cores for large real-world graphs representing mass cytometry data from single-cell experiments.
我们开发了一种算法,该算法能在图的许多不同聚类解决方案中找到共识。我们将该问题表述为一个中位数集划分问题,并提出了一种贪婪优化技术。与其他寻找中位数集划分的方法不同,我们的算法考虑了图结构,并且比其他方法更快地找到质量相当的解决方案。对于具有已知社区的图,我们的共识划分比其他方法更准确地捕捉实际的社区结构。为了使其适用于大型图,我们从算法中消除了顺序依赖性,并设计了一种并行算法。当利用64个处理核心处理表示来自单细胞实验的质谱流式细胞术数据的大型真实世界图时,我们的并行算法实现了35倍的加速。