用于空间转录组学的核有界聚类可实现对复杂空间域的可扩展发现。

Kernel-bounded clustering for spatial transcriptomics enables scalable discovery of complex spatial domains.

作者信息

Zhang Hang, Zhang Yi, Ting Kai Ming, Zhang Jie, Zhao Qiuran

机构信息

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China.

School of Artificial Intelligence, Nanjing University, Nanjing 210023, China.

出版信息

Genome Res. 2025 Feb 14;35(2):355-367. doi: 10.1101/gr.278983.124.

DOI:10.1101/gr.278983.124

PMID:39909714

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11874963/

Abstract

Spatial transcriptomics are a collection of technologies that have enabled characterization of gene expression profiles and spatial information in tissue samples. Existing methods for clustering spatial transcriptomics data have primarily focused on data transformation techniques to represent the data suitably for subsequent clustering analysis, often using an existing clustering algorithm. These methods have limitations in handling complex data characteristics with varying densities, sizes, and shapes (in the transformed space on which clustering is performed), and they have high computational complexity, resulting in unsatisfactory clustering outcomes and slow execution time even with GPUs. Rather than focusing on data transformation techniques, we propose a new clustering algorithm called kernel-bounded clustering (KBC). It has two unique features: (1) It is the first clustering algorithm that employs a distributional kernel to recruit members of a cluster, enabling clusters of varying densities, sizes, and shapes to be discovered, and (2) it is a linear-time clustering algorithm that significantly enhances the speed of clustering analysis, enabling researchers to effectively handle large-scale spatial transcriptomics data sets. We show that (1) KBC works well with a simple data transformation technique called the Weisfeiler-Lehman scheme, and (2) a combination of KBC and the Weisfeiler-Lehman scheme produces good clustering outcomes, and it is faster and easier-to-use than many methods that employ existing clustering algorithms and data transformation techniques.

摘要

空间转录组学是一系列能够对组织样本中的基因表达谱和空间信息进行表征的技术。现有的空间转录组学数据聚类方法主要集中在数据转换技术上，以便将数据适当地表示出来用于后续的聚类分析，通常会使用现有的聚类算法。这些方法在处理具有不同密度、大小和形状（在进行聚类的变换空间中）的复杂数据特征时存在局限性，并且计算复杂度高，即使使用图形处理器（GPU），也会导致聚类结果不理想且执行时间长。我们提出了一种名为核有界聚类（KBC）的新聚类算法，而不是专注于数据转换技术。它有两个独特的特点：（1）它是第一种使用分布核来招募聚类成员的聚类算法，能够发现具有不同密度、大小和形状的聚类；（2）它是一种线性时间聚类算法，显著提高了聚类分析的速度，使研究人员能够有效地处理大规模空间转录组学数据集。我们表明：（1）KBC与一种名为魏斯费勒 - 莱曼（Weisfeiler-Lehman）方案的简单数据转换技术配合良好；（2）KBC与魏斯费勒 - 莱曼方案的组合产生了良好的聚类结果，并且比许多使用现有聚类算法和数据转换技术的方法更快且更易于使用。