College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China.
Moorestown High School, Moorestown, NJ 08057, USA.
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad335.
Single-cell RNA sequencing (scRNA-seq) is a widely used technique for characterizing individual cells and studying gene expression at the single-cell level. Clustering plays a vital role in grouping similar cells together for various downstream analyses. However, the high sparsity and dimensionality of large scRNA-seq data pose challenges to clustering performance. Although several deep learning-based clustering algorithms have been proposed, most existing clustering methods have limitations in capturing the precise distribution types of the data or fully utilizing the relationships between cells, leaving a considerable scope for improving the clustering performance, particularly in detecting rare cell populations from large scRNA-seq data. We introduce DeepScena, a novel single-cell hierarchical clustering tool that fully incorporates nonlinear dimension reduction, negative binomial-based convolutional autoencoder for data fitting, and a self-supervision model for cell similarity enhancement. In comprehensive evaluation using multiple large-scale scRNA-seq datasets, DeepScena consistently outperformed seven popular clustering tools in terms of accuracy. Notably, DeepScena exhibits high proficiency in identifying rare cell populations within large datasets that contain large numbers of clusters. When applied to scRNA-seq data of multiple myeloma cells, DeepScena successfully identified not only previously labeled large cell types but also subpopulations in CD14 monocytes, T cells and natural killer cells, respectively.
单细胞 RNA 测序 (scRNA-seq) 是一种广泛用于描述单个细胞和研究单细胞水平基因表达的技术。聚类在将相似的细胞分组用于各种下游分析方面起着至关重要的作用。然而,大量 scRNA-seq 数据的高度稀疏性和维度性给聚类性能带来了挑战。尽管已经提出了几种基于深度学习的聚类算法,但大多数现有的聚类方法在捕获数据的精确分布类型或充分利用细胞之间的关系方面存在局限性,因此在提高聚类性能方面仍有很大的改进空间,尤其是在从大型 scRNA-seq 数据中检测稀有细胞群体方面。我们引入了 DeepScena,这是一种新颖的单细胞层次聚类工具,它充分结合了非线性降维、基于负二项式的卷积自动编码器进行数据拟合,以及用于细胞相似性增强的自监督模型。在使用多个大规模 scRNA-seq 数据集进行的综合评估中,DeepScena 在准确性方面始终优于七种流行的聚类工具。值得注意的是,DeepScena 在识别大型数据集中的稀有细胞群体方面表现出色,这些数据集包含大量的聚类。当应用于多发性骨髓瘤细胞的 scRNA-seq 数据时,DeepScena 不仅成功地识别了先前标记的大细胞类型,还分别识别了 CD14 单核细胞、T 细胞和自然杀伤细胞中的亚群。