Suppr超能文献

基于空间转录组学数据的草图绘制方法基准测试

Benchmarking sketching methods on spatial transcriptomics data.

作者信息

Gingerich Ian K, Goods Brittany A, Frost H Robert

机构信息

Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA.

Thayer School of Engineering, Dartmouth College, Hanover, NH, USA.

出版信息

bioRxiv. 2025 Sep 2:2025.08.26.672376. doi: 10.1101/2025.08.26.672376.

Abstract

High-throughput spatial transcriptomics (ST) now profiles hundreds of thousands of cells or locations per section, creating computational bottlenecks for routine analysis. Sketching, or intelligent sub-sampling, addresses scale by selecting small, representative subsets. While effective for scRNA-seq data, existing sketching methods, which optimize coverage in expression space but ignore physical location, can introduce spatial bias when applied to ST data. To explore the impact of sketching on ST analysis, we systematically benchmarked uniform sampling, leverage-score sampling, Geosketch (minimax/Hausdorff), and scSampler (maximin) across multiple real ST datasets (mouse ovary, MERFISH brain, human breast cancer, lung) and simulations, using three input representations: PCA embeddings, spatial coordinates, and spatially smoothed embeddings. We show that expression-only designs capture global transcriptomic heterogeneity but distort tissue architecture by over-sampling high-variability regions and under-sampling homogeneous areas. Coordinate-only sampling restores tissue coverage but misses transcriptional extremes. A simple spatially aware extension, computing leverage scores from a randomized SVD basis smoothed by a spatial weights matrix, strikes a favorable balance, recovering rare cell states while maintaining uniform tissue coverage and avoiding edge effects. Across robust Hausdorff distances, clustering stability (ARI), PCA loading drift, and local cell-type MSE, spatially smoothed leverage scores match or outperform alternatives. These results motivate joint spatial-transcriptomic sketching objectives to enable fast, unbiased analyses of increasingly large ST datasets.

摘要

高通量空间转录组学(ST)现在可以对每个切片中的数十万个细胞或位置进行分析,这给常规分析带来了计算瓶颈。草图绘制,即智能子采样,通过选择小的、有代表性的子集来解决规模问题。虽然现有的草图绘制方法对单细胞RNA测序(scRNA-seq)数据有效,这些方法在表达空间中优化覆盖范围但忽略物理位置,应用于ST数据时可能会引入空间偏差。为了探究草图绘制对ST分析的影响,我们在多个真实的ST数据集(小鼠卵巢、MERFISH脑、人类乳腺癌、肺)和模拟数据上,使用三种输入表示:主成分分析(PCA)嵌入、空间坐标和空间平滑嵌入,系统地对均匀采样、杠杆得分采样、Geosketch(极小极大/豪斯多夫)和scSampler(极大极小)进行了基准测试。我们表明,仅基于表达的设计能够捕捉全局转录组异质性,但会通过对高变异性区域过度采样和对均匀区域欠采样来扭曲组织结构。仅基于坐标的采样恢复了组织覆盖,但遗漏了转录极端情况。一个简单的空间感知扩展,即从由空间权重矩阵平滑的随机奇异值分解(SVD)基计算杠杆得分,达到了良好的平衡,在保持均匀组织覆盖并避免边缘效应的同时恢复了罕见细胞状态。在稳健的豪斯多夫距离、聚类稳定性(ARI)、PCA载荷漂移和局部细胞类型均方误差(MSE)方面,空间平滑杠杆得分与其他方法相当或更优。这些结果促使联合空间 - 转录组草图绘制目标,以实现对越来越大的ST数据集进行快速、无偏的分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c3b/12407997/c23a8ccd3e0a/nihpp-2025.08.26.672376v2-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验