Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, USA.
The Pelotonia Institute for Immuno-Oncology, The Ohio State University Comprehensive Cancer Center, Columbus, Ohio, USA.
Biometrics. 2023 Sep;79(3):1775-1787. doi: 10.1111/biom.13727. Epub 2022 Aug 10.
High throughput spatial transcriptomics (HST) is a rapidly emerging class of experimental technologies that allow for profiling gene expression in tissue samples at or near single-cell resolution while retaining the spatial location of each sequencing unit within the tissue sample. Through analyzing HST data, we seek to identify sub-populations of cells within a tissue sample that may inform biological phenomena. Existing computational methods either ignore the spatial heterogeneity in gene expression profiles, fail to account for important statistical features such as skewness, or are heuristic-based network clustering methods that lack the inferential benefits of statistical modeling. To address this gap, we develop SPRUCE: a Bayesian spatial multivariate finite mixture model based on multivariate skew-normal distributions, which is capable of identifying distinct cellular sub-populations in HST data. We further implement a novel combination of Pólya-Gamma data augmentation and spatial random effects to infer spatially correlated mixture component membership probabilities without relying on approximate inference techniques. Via a simulation study, we demonstrate the detrimental inferential effects of ignoring skewness or spatial correlation in HST data. Using publicly available human brain HST data, SPRUCE outperforms existing methods in recovering expertly annotated brain layers. Finally, our application of SPRUCE to human breast cancer HST data indicates that SPRUCE can distinguish distinct cell populations within the tumor microenvironment. An R package spruce for fitting the proposed models is available through The Comprehensive R Archive Network.
高通量空间转录组学(HST)是一类新兴的实验技术,能够在保留组织样本中每个测序单元的空间位置的情况下,以单细胞分辨率对组织样本中的基因表达进行分析。通过分析 HST 数据,我们试图识别组织样本中可能反映生物学现象的细胞亚群。现有的计算方法要么忽略基因表达谱中的空间异质性,要么无法解释偏度等重要统计特征,要么是基于启发式的网络聚类方法,缺乏统计建模的推论优势。为了解决这一差距,我们开发了 SPRUCE:一种基于多元 skew-normal 分布的贝叶斯空间多元有限混合模型,能够识别 HST 数据中的不同细胞亚群。我们进一步实现了一种新颖的 Pólya-Gamma 数据增强和空间随机效应的组合,无需依赖近似推理技术,即可推断具有空间相关性的混合成分成员概率。通过模拟研究,我们证明了在 HST 数据中忽略偏度或空间相关性会产生有害的推论影响。使用公开的人类大脑 HST 数据,SPRUCE 在恢复专家注释的大脑层方面优于现有方法。最后,我们将 SPRUCE 应用于人类乳腺癌 HST 数据表明,SPRUCE 可以区分肿瘤微环境中的不同细胞群体。用于拟合所提出模型的 R 包 spruce 可通过 Comprehensive R Archive Network 获取。