Wang Song, Liu Yuxi, Zhang Zhenhao, Ma Qin, Song Qianqian, Bian Jiang
bioRxiv. 2025 Jul 19:2025.07.01.662625. doi: 10.1101/2025.07.01.662625.
Recent advances in spatial transcriptomics (ST) technologies have revolutionized our understanding of cellular functions by providing gene expression profiles with rich spatial context. Effectively learning spatial representations is crucial for downstream analyses and requires robust integration of spatial information with transcriptomic data. While existing methods have shown promise, they often fail to adequately capture both local (neighbor-level) and global (tissue-wide) spatial contexts. Moreover, they tend to rely heavily on augmentation strategies, which can introduce noise and instability.
This study aims to introduce and demonstrate a novel, versatile framework called GatorST, which explicitly combines graph-based modeling with advanced learning strategies to generate spatially informed representations of ST data. GatorST is designed to improve various downstream tasks, including identification of spatial domains, gene expression imputation, batch effect removal, and trajectory inference.
GatorST constructs a spot-spot graph by connecting each node to its k nearest spatial neighbors and extracts two-hop neighborhood subgraphs to capture local context. At the global level, gene expression profiles are clustered using soft K-means to generate pseudo-labels, which serve as weak supervision signals within a contrastive learning framework. This process encourages the alignment of embeddings with shared pseudo-labels while separating those with different labels. GatorST further adopts an episodic training strategy inspired by meta-learning, wherein each episode consists of a support set for contrastive optimization and a disjoint query set for embedding classification, guided by the pseudo-labeled data. This design enables the model to classify unseen samples based on learned embeddings, thereby enhancing its generalization to new spatial contexts.
Comprehensive comparisons with fifteen state-of-the-art methods across fourteen spatial transcriptomics datasets demonstrate that GatorST consistently achieves superior performance in identifying spatial domains, imputing gene expressions, and removing batch effects. The results showcase the versatility and strong generalization capabilities of GatorST across diverse tissue types and experimental settings.
GatorST effectively integrates spatial topology and global gene expression through graph-based modeling, pseudo-labeling, and contrastive meta-learning. This framework generates biologically meaningful representations and significantly improves key downstream tasks, including spatial domain identification, gene expression imputation, batch effect removal, and trajectory inference.
空间转录组学(ST)技术的最新进展通过提供具有丰富空间背景的基因表达谱,彻底改变了我们对细胞功能的理解。有效地学习空间表示对于下游分析至关重要,并且需要将空间信息与转录组数据进行稳健整合。虽然现有方法已显示出前景,但它们往往无法充分捕捉局部(邻域级)和全局(组织范围)空间背景。此外,它们往往严重依赖增强策略,这可能会引入噪声和不稳定性。
本研究旨在介绍并展示一种名为GatorST的新颖通用框架,该框架明确地将基于图的建模与先进的学习策略相结合,以生成ST数据的空间信息表示。GatorST旨在改进各种下游任务,包括空间域识别、基因表达插补、批次效应去除和轨迹推断。
GatorST通过将每个节点连接到其k个最近的空间邻居来构建点-点图,并提取两跳邻域子图以捕捉局部背景。在全局层面,使用软K均值对基因表达谱进行聚类以生成伪标签,这些伪标签在对比学习框架中用作弱监督信号。此过程鼓励嵌入与共享伪标签对齐,同时分离具有不同标签的嵌入。GatorST进一步采用受元学习启发的情景训练策略,其中每个情景由一个用于对比优化的支持集和一个用于嵌入分类的不相交查询集组成,由伪标记数据引导。这种设计使模型能够基于学习到的嵌入对未见样本进行分类,从而增强其对新空间背景的泛化能力。
在十四个空间转录组学数据集上与十五种最先进方法进行的全面比较表明,GatorST在识别空间域、插补基因表达和去除批次效应方面始终取得优异性能。结果展示了GatorST在不同组织类型和实验设置中的通用性和强大的泛化能力。
GatorST通过基于图的建模、伪标记和对比元学习有效地整合了空间拓扑和全局基因表达。该框架生成具有生物学意义的表示,并显著改进关键的下游任务,包括空间域识别、基因表达插补、批次效应去除和轨迹推断。