Department of Computer Science, Ying Wu College of Computing, New Jersey Institute of Technology, Newark, NJ 07102, United States.
Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, United States.
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad641.
The recent development of spatially resolved transcriptomics (SRT) technologies has facilitated research on gene expression in the spatial context. Annotating cell types is one crucial step for downstream analysis. However, many existing algorithms use an unsupervised strategy to assign cell types for SRT data. They first conduct clustering analysis and then aggregate cluster-level expression based on the clustering results. This workflow fails to leverage the marker gene information efficiently. On the other hand, other cell annotation methods designed for single-cell RNA-seq data utilize the cell-type marker genes information but fail to use spatial information in SRT data.
We introduce a statistical spatial transcriptomics cell assignment model, SPAN, to annotate clusters of cells or spots into known types in SRT data with prior knowledge of predefined marker genes and spatial information. The SPAN model annotates cells or spots from SRT data using predefined overexpressed marker genes and combines a mixture model with a hidden Markov random field to model the spatial dependency between neighboring spots. We demonstrate the effectiveness of SPAN against spatial and nonspatial clustering algorithms through extensive simulation and real data experiments.
最近空间分辨转录组学(SRT)技术的发展促进了在空间背景下研究基因表达。注释细胞类型是下游分析的一个关键步骤。然而,许多现有的算法使用无监督策略来为 SRT 数据分配细胞类型。它们首先进行聚类分析,然后根据聚类结果聚合簇级表达。这种工作流程未能有效地利用标记基因信息。另一方面,为单细胞 RNA-seq 数据设计的其他细胞注释方法利用了细胞类型标记基因信息,但未能利用 SRT 数据中的空间信息。
我们引入了一个统计空间转录组学细胞分配模型 SPAN,该模型利用预定义标记基因和空间信息的先验知识,将 SRT 数据中的细胞簇或斑点注释为已知类型。SPAN 模型使用预定义的过表达标记基因从 SRT 数据中注释细胞或斑点,并结合混合模型和隐马尔可夫随机场来模拟相邻斑点之间的空间依赖性。我们通过广泛的模拟和真实数据实验证明了 SPAN 对抗空间和非空间聚类算法的有效性。