IEEE/ACM Trans Comput Biol Bioinform. 2023 May-Jun;20(3):2007-2015. doi: 10.1109/TCBB.2022.3230098. Epub 2023 Jun 5.
Advances in single-cell RNA sequencing (scRNA-seq) technologies allow researchers to analyze the genome-wide transcription profile and to solve biological problems at the individual-cell resolution. However, existing clustering methods on scRNA-seq suffer from high dropout rate and curse of dimensionality in the data. Here, we propose a novel pipeline, scBKAP, the cornerstone of which is a single-cell bisecting K-means clustering method based on an autoencoder network and a dimensionality reduction model MPDR. Specially, scBKAP utilizes an autoencoder network to reconstruct gene expression values from scRNA-seq data to alleviate the dropout issue, and the MPDR model composed of the M3Drop feature selection algorithm and the PHATE dimensionality reduction algorithm to reduce the dimensions of reconstructed data. The dimensionality-reduced data are then fed into the bisecting K-means clustering algorithm to identify the clusters of cells. Comprehensive experiments demonstrate scBKAP's superior performance over nine state-of-the-art single-cell clustering methods on 21 public scRNA-seq datasets and simulated datasets. The source codes and datasets are available at https://github.com/YuBinLab-QUST/scBKAP/ and https://doi.org/10.24433/CO.4592131.v1.
单细胞 RNA 测序 (scRNA-seq) 技术的进步使研究人员能够分析全基因组转录谱,并以单细胞分辨率解决生物学问题。然而,现有的 scRNA-seq 聚类方法存在高缺失率和数据维度诅咒的问题。在这里,我们提出了一种新的流水线 scBKAP,其基石是基于自动编码器网络和降维模型 MPDR 的单细胞二分 K-均值聚类方法。特别地,scBKAP 利用自动编码器网络从 scRNA-seq 数据中重建基因表达值,以减轻缺失问题,而由 M3Drop 特征选择算法和 PHATE 降维算法组成的 MPDR 模型则用于降低重建数据的维度。然后,将降维后的数据输入二分 K-均值聚类算法以识别细胞簇。综合实验表明,在 21 个公共 scRNA-seq 数据集和模拟数据集上,scBKAP 在九种最先进的单细胞聚类方法中的性能更为优越。源代码和数据集可在 https://github.com/YuBinLab-QUST/scBKAP/ 和 https://doi.org/10.24433/CO.4592131.v1 上获得。