Sun Jieran, Biharie Kirti, Cai Peiying, Müller-Bötticher Niklas, Kiessling Paul, Turner Meghan A, Dam Søren H, Heyl Florian, Kathirchelvan Sarusan, Emons Martin, Gunz Samuel, Twardziok Sven, El-Heliebi Amin, Zacharias Martin, Eils Roland, Reinders Marcel, Gottardo Raphael, Kuppe Christoph, Long Brian, Mahfouz Ahmed, Robinson Mark D, Ishaque Naveed
Biomedical Data Science Center, Centre Hospitalier Universitaire Vaudois, Rue du Bugnon 21, 1011 Lausanne, Switzerland.
Department of Human Genetics, Leiden University Medical Center, Einthovenweg 20, 2333 ZC Leiden, Netherlands.
bioRxiv. 2025 Jun 27:2025.06.23.660861. doi: 10.1101/2025.06.23.660861.
Spatial omics technologies have revolutionized the study of tissue architecture and cellular heterogeneity by integrating molecular profiles with spatial localization. In spatially resolved transcriptomics, delineating higher-order anatomical structures is critical for understanding how cellular organization affects tissue and organ function. Since 2020, more than 50 spatially aware clustering (SAC) methods have been developed for this purpose. However, the reliability of current benchmarks is undermined by their narrow focus on Visium and brain tissue datasets, as well as incorrect interpretation of manual annotation as ground truth. Here, we present SACCELERATOR, a community-driven, extensible framework that standardizes data formatting, method integration, and metric evaluation, and is designed to rapidly incorporate new methods and datasets. SACCELERATOR currently includes 22 SAC methods applied to 15 datasets spanning 9 technologies and diverse tissue types. Our analysis revealed substantial limitations in the generalizability and reproducibility of SAC methods across tissues and platforms. We also demonstrate that anatomical labels commonly used as ground truths are often biased, potentially error-prone, and, in some cases, unsuitable for benchmarking efforts. Rather than scoring and comparing methods, we propose a consensus-guided workflow that aggregates clustering results to generate consensus representations. Descriptive spatial metrics highlight areas of high entropy where method disagreement is highest, enabling targeted feedback for tissue experts. Applied to brain and cancer datasets, this approach uncovered biologically meaningful patterns overlooked by individual methods and manual annotations. Our results underscore the need for iterative, expert-in-the-loop analysis and reveal that traditional evaluation metrics do not always capture the subjective qualities of results. By improving tissue annotation and addressing key benchmarking limitations, SACCELERATOR provides a robust foundation for advancing spatial omics research.
空间组学技术通过将分子图谱与空间定位相结合,彻底改变了对组织结构和细胞异质性的研究。在空间分辨转录组学中,描绘高阶解剖结构对于理解细胞组织如何影响组织和器官功能至关重要。自2020年以来,已经为此目的开发了50多种空间感知聚类(SAC)方法。然而,当前基准的可靠性受到其对Visium和脑组织数据集的狭隘关注以及将手动注释错误解释为基本事实的影响。在这里,我们展示了SACCELERATOR,这是一个由社区驱动的可扩展框架,它标准化了数据格式化、方法集成和指标评估,并旨在快速纳入新方法和数据集。SACCELERATOR目前包括22种应用于15个数据集的SAC方法,这些数据集涵盖9种技术和多种组织类型。我们的分析揭示了SAC方法在跨组织和平台的通用性和可重复性方面存在重大局限性。我们还证明,通常用作基本事实的解剖学标签往往存在偏差,可能容易出错,并且在某些情况下不适用于基准测试工作。我们不是对方法进行评分和比较,而是提出了一种由共识引导的工作流程,该流程汇总聚类结果以生成共识表示。描述性空间指标突出了高熵区域,这些区域中方法之间的分歧最大,从而能够为组织专家提供有针对性的反馈。应用于脑和癌症数据集时,这种方法揭示了个体方法和手动注释忽略的生物学上有意义的模式。我们的结果强调了进行迭代的、专家参与的分析的必要性,并表明传统评估指标并不总是能够捕捉结果的主观质量。通过改进组织注释并解决关键的基准测试限制,SACCELERATOR为推进空间组学研究提供了一个强大的基础。