Gingerich Ian K, Goods Brittany A, Frost H Robert
Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA.
Thayer School of Engineering, Dartmouth College, Hanover, NH, USA.
bioRxiv. 2024 Dec 24:2024.12.20.629785. doi: 10.1101/2024.12.20.629785.
Spatial transcriptomics (ST) provides critical insights into the complex spatial organization of gene expression in tissues, enabling researchers to unravel the intricate relationship between cellular environments and biological function. Identifying spatial domains within tissues is essential for understanding tissue architecture and the mechanisms underlying various biological processes, including development and disease progression. Here, we present Randomized Spatial PCA (RASP), a novel spatially aware dimensionality reduction method for spatial transcriptomics (ST) data. RASP is designed to be orders-of-magnitude faster than existing techniques, scale to ST data with hundreds-of-thousands of locations, support the flexible integration of non-transcriptomic covariates, and enable the reconstruction of de-noised and spatially smoothed expression values for individual genes. To achieve these goals, RASP uses a randomized two-stage principal component analysis (PCA) framework that leverages sparse matrix operations and configurable spatial smoothing. We compared the performance of RASP against five alternative methods (BASS, GraphST, SEDR, spatialPCA, and STAGATE) on four publicly available ST datasets generated using diverse techniques and resolutions (10x Visium, Stereo-Seq, MERFISH, and 10x Xenium) on human and mouse tissues. Our results demonstrate that RASP achieves tissue domain detection performance comparable or superior to existing methods with a several orders-of-magnitude improvement in computational speed. The efficiency of RASP enhances the analysis of complex ST data by facilitating the exploration of increasingly high-resolution subcellular ST datasets that are being generated.
空间转录组学(ST)为深入了解组织中基因表达的复杂空间组织提供了关键见解,使研究人员能够揭示细胞环境与生物学功能之间的复杂关系。识别组织内的空间域对于理解组织结构以及包括发育和疾病进展在内的各种生物学过程的潜在机制至关重要。在此,我们提出了随机空间主成分分析(RASP),这是一种用于空间转录组学(ST)数据的新型空间感知降维方法。RASP的设计比现有技术快几个数量级,可扩展到具有数十万个位置的ST数据,支持非转录组协变量的灵活整合,并能够重建单个基因的去噪和空间平滑表达值。为实现这些目标,RASP使用了一个随机两阶段主成分分析(PCA)框架,该框架利用了稀疏矩阵运算和可配置的空间平滑。我们在使用不同技术和分辨率(10x Visium、Stereo-Seq、MERFISH和10x Xenium)生成的四个人类和小鼠组织的公开可用ST数据集上,将RASP的性能与五种替代方法(BASS、GraphST、SEDR、spatialPCA和STAGATE)进行了比较。我们的结果表明,RASP实现了与现有方法相当或更优的组织域检测性能,计算速度提高了几个数量级。RASP的效率通过促进对正在生成的越来越高分辨率的亚细胞ST数据集的探索,增强了对复杂ST数据的分析。