MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Institute for Precision Medicine & Department of Automation, Tsinghua University, Beijing 100084, China.
Department of Finance, Shanghai Advanced Institute of Finance, Shanghai Jiao Tong University, Shanghai 200240, China.
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae028.
Single-cell RNA-seq (scRNA-seq) is a powerful technique for decoding the complex cellular compositions in the tumor microenvironment (TME). As previous studies have defined many meaningful cell subtypes in several tumor types, there is a great need to computationally transfer these labels to new datasets. Also, different studies used different approaches or criteria to define the cell subtypes for the same major cell lineages. The relationships between the cell subtypes defined in different studies should be carefully evaluated. In this updated package scCancer2, designed for integrative tumor scRNA-seq data analysis, we developed a supervised machine learning framework to annotate TME cells with annotated cell subtypes from 15 scRNA-seq datasets with 594 samples in total. Based on the trained classifiers, we quantitatively constructed the similarity maps between the cell subtypes defined in different references by testing on all the 15 datasets. Secondly, to improve the identification of malignant cells, we designed a classifier by integrating large-scale pan-cancer TCGA bulk gene expression datasets and scRNA-seq datasets (10 cancer types, 175 samples, 663 857 cells). This classifier shows robust performances when no internal confidential reference cells are available. Thirdly, scCancer2 integrated a module to process the spatial transcriptomic data and analyze the spatial features of TME.
The package and user documentation are available at http://lifeome.net/software/sccancer2/ and https://doi.org/10.5281/zenodo.10477296.
单细胞 RNA 测序 (scRNA-seq) 是解码肿瘤微环境 (TME) 中复杂细胞组成的强大技术。由于之前的研究已经在几种肿瘤类型中定义了许多有意义的细胞亚型,因此非常需要将这些标签计算转移到新的数据集上。此外,不同的研究使用不同的方法或标准来定义相同主要细胞谱系的细胞亚型。不同研究中定义的细胞亚型之间的关系应该仔细评估。在这个名为 scCancer2 的更新软件包中,专门用于整合肿瘤 scRNA-seq 数据分析,我们开发了一个监督机器学习框架,使用总共 594 个样本的 15 个 scRNA-seq 数据集的带注释的细胞亚型来注释 TME 细胞。基于训练好的分类器,我们通过在所有 15 个数据集上进行测试,定量构建了不同参考文献中定义的细胞亚型之间的相似性图。其次,为了提高恶性细胞的识别能力,我们设计了一个分类器,该分类器整合了大规模的泛癌 TCGA 批量基因表达数据集和 scRNA-seq 数据集(10 种癌症类型,175 个样本,663857 个细胞)。当没有内部机密参考细胞时,该分类器显示出稳健的性能。第三,scCancer2 集成了一个处理空间转录组数据和分析 TME 空间特征的模块。
该软件包和用户文档可在 http://lifeome.net/software/sccancer2/ 和 https://doi.org/10.5281/zenodo.10477296 获得。