Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing 100850, China.
Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China.
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae198.
Clustering analysis for single-cell RNA sequencing (scRNA-seq) data is an important step in revealing cellular heterogeneity. Many clustering methods have been proposed to discover heterogenous cell types from scRNA-seq data. However, adaptive clustering with accurate cluster number reflecting intrinsic biology nature from large-scale scRNA-seq data remains quite challenging.
Here, we propose a single-cell Deep Adaptive Clustering (scDAC) model by coupling the Autoencoder (AE) and the Dirichlet Process Mixture Model (DPMM). By jointly optimizing the model parameters of AE and DPMM, scDAC achieves adaptive clustering with accurate cluster numbers on scRNA-seq data. We verify the performance of scDAC on five subsampled datasets with different numbers of cell types and compare it with 15 widely used clustering methods across nine scRNA-seq datasets. Our results demonstrate that scDAC can adaptively find accurate numbers of cell types or subtypes and outperforms other methods. Moreover, the performance of scDAC is robust to hyperparameter changes.
The scDAC is implemented in Python. The source code is available at https://github.com/labomics/scDAC.
单细胞 RNA 测序 (scRNA-seq) 数据的聚类分析是揭示细胞异质性的重要步骤。已经提出了许多聚类方法来从 scRNA-seq 数据中发现异质细胞类型。然而,从大规模 scRNA-seq 数据中自适应聚类并准确反映内在生物学特性的聚类数量仍然极具挑战性。
在这里,我们通过结合自动编码器 (AE) 和狄利克雷过程混合模型 (DPMM) ,提出了一种单细胞深度自适应聚类 (scDAC) 模型。通过联合优化 AE 和 DPMM 的模型参数,scDAC 可以在 scRNA-seq 数据上自适应聚类并获得准确的聚类数量。我们在五个具有不同细胞类型数量的子采样数据集上验证了 scDAC 的性能,并在九个 scRNA-seq 数据集上与 15 种常用聚类方法进行了比较。我们的结果表明,scDAC 可以自适应地找到准确的细胞类型或亚群数量,并且优于其他方法。此外,scDAC 的性能对超参数变化具有鲁棒性。
scDAC 是用 Python 实现的。源代码可在 https://github.com/labomics/scDAC 上获得。