Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
Department of Biostatistics, University of Florida, Gainesville, FL 32603, United States.
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad449.
Single-cell RNA-sequencing (scRNA-seq) has enabled the molecular profiling of thousands to millions of cells simultaneously in biologically heterogenous samples. Currently, the common practice in scRNA-seq is to determine cell type labels through unsupervised clustering and the examination of cluster-specific genes. However, even small differences in analysis and parameter choosing can greatly alter clustering results and thus impose great influence on which cell types are identified. Existing methods largely focus on determining the optimal number of robust clusters, which can be problematic for identifying cells of extremely low abundance due to their subtle contributions toward overall patterns of gene expression.
Here, we present a carefully designed framework, SCISSORS, which accurately profiles subclusters within broad cluster(s) for the identification of rare cell types in scRNA-seq data. SCISSORS employs silhouette scoring for the estimation of heterogeneity of clusters and reveals rare cells in heterogenous clusters by a multi-step semi-supervised reclustering process. Additionally, SCISSORS provides a method for the identification of marker genes of high specificity to the cell type. SCISSORS is wrapped around the popular Seurat R package and can be easily integrated into existing Seurat pipelines.
SCISSORS, including source code and vignettes, are freely available at https://github.com/jr-leary7/SCISSORS.
单细胞 RNA 测序(scRNA-seq)使我们能够同时对生物异质样本中的数千至数百万个细胞进行分子分析。目前,scRNA-seq 的常见做法是通过无监督聚类和检查特定于聚类的基因来确定细胞类型标签。然而,即使在分析和参数选择方面的微小差异也会极大地改变聚类结果,从而对识别哪些细胞类型产生重大影响。现有的方法主要集中在确定稳健的聚类数量上,这对于识别极低丰度的细胞可能会存在问题,因为它们对基因表达的整体模式的贡献非常微小。
在这里,我们提出了一个精心设计的框架 SCISSORS,用于在 scRNA-seq 数据中识别稀有细胞类型,该框架可以准确地对宽聚类中的亚群进行分析。SCISSORS 通过使用轮廓评分来估计聚类的异质性,并通过多步半监督再聚类过程揭示异质聚类中的稀有细胞。此外,SCISSORS 提供了一种用于识别对细胞类型具有高特异性的标记基因的方法。SCISSORS 围绕流行的 Seurat R 包进行封装,并且可以轻松集成到现有的 Seurat 管道中。
SCISSORS 包括源代码和示例,可在 https://github.com/jr-leary7/SCISSORS 上免费获取。