Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY, USA.
Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA.
Genome Biol. 2023 Oct 12;24(1):228. doi: 10.1186/s13059-023-03062-0.
Clustering molecular data into informative groups is a primary step in extracting robust conclusions from big data. However, due to foundational issues in how they are defined and detected, such clusters are not always reliable, leading to unstable conclusions. We compare popular clustering algorithms across thousands of synthetic and real biological datasets, including a new consensus clustering algorithm-SpeakEasy2: Champagne. These tests identify trends in performance, show no single method is universally optimal, and allow us to examine factors behind variation in performance. Multiple metrics indicate SpeakEasy2 generally provides robust, scalable, and informative clusters for a range of applications.
将分子数据聚类为信息组是从大数据中提取可靠结论的首要步骤。然而,由于它们的定义和检测方式存在基础问题,因此这些聚类并不总是可靠的,导致结论不稳定。我们比较了数千个合成和真实生物数据集的流行聚类算法,包括一种新的共识聚类算法-SpeakEasy2:Champagne。这些测试确定了性能趋势,表明没有一种方法是普遍最优的,并使我们能够检查性能变化背后的因素。多种指标表明,SpeakEasy2 通常可为各种应用提供稳健、可扩展且信息丰富的聚类。