Biomedical Data Science, Dartmouth College, Lebanon, NH 03766, USA,
Pac Symp Biocomput. 2022;27:199-210.
Inferring the cell types in single-cell RNA-sequencing (scRNA-seq) data is of particular importance for understanding the potential cellular mechanisms and phenotypes occurring in complex tissues, such as the tumor-immune microenvironment (TME). The sparsity and noise of scRNA-seq data, combined with the fact that immune cell types often occur on a continuum, make cell typing of TME scRNA-seq data a significant challenge. Several single-label cell typing methods have been put forth to address the limitations of noise and sparsity, but accounting for the often overlapped spectrum of cell types in the immune TME remains an obstacle. To address this, we developed a new scRNA-seq cell-typing method, Cell-typing using variance Adjusted Mahalanobis distances with Multi-Labeling (CAMML). CAMML leverages cell type-specific weighted gene sets to score every cell in a dataset for every potential cell type. This allows cells to be labelled either by their highest scoring cell type as a single label classification or based on a score cut-off to give multi-label classification. For single-label cell typing, CAMML performance is comparable to existing cell typing methods, SingleR and Garnett. For scenarios where cells may exhibit features of multiple cell types (e.g., undifferentiated cells), the multi-label classification supported by CAMML offers important benefits relative to the current state-of-the-art methods. By integrating data across studies, omics platforms, and species, CAMML serves as a robust and adaptable method for overcoming the challenges of scRNA-seq analysis.
在单细胞 RNA 测序(scRNA-seq)数据中推断细胞类型对于理解复杂组织(如肿瘤免疫微环境(TME))中发生的潜在细胞机制和表型尤为重要。scRNA-seq 数据的稀疏性和噪声,加上免疫细胞类型通常连续存在的事实,使得 TME scRNA-seq 数据的细胞分型成为一个重大挑战。已经提出了几种单标签细胞分型方法来解决噪声和稀疏性的限制,但考虑到免疫 TME 中细胞类型的重叠范围仍然是一个障碍。为了解决这个问题,我们开发了一种新的 scRNA-seq 细胞分型方法,使用具有多标签的方差调整马氏距离的细胞分型(CAMML)。CAMML 利用细胞类型特异性加权基因集为数据集中的每个细胞对每个潜在细胞类型进行评分。这允许细胞通过其最高评分的细胞类型进行标记,作为单标签分类,或者根据得分截止值进行多标签分类。对于单标签细胞分型,CAMML 的性能可与现有的细胞分型方法(SingleR 和 Garnett)相媲美。对于细胞可能表现出多种细胞类型特征的情况(例如,未分化细胞),CAMML 支持的多标签分类相对于当前的最先进方法具有重要优势。通过整合来自不同研究、组学平台和物种的数据,CAMML 是一种强大且适应性强的 scRNA-seq 分析方法。