Bian Chuang, Wang Xubin, Su Yanchi, Wang Yunhe, Wong Ka-Chun, Li Xiangtao
School of Artificial Intelligence, Jilin University, Changchun, 130000, Jilin, China.
School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China.
Comput Struct Biotechnol J. 2022 Apr 27;20:2181-2197. doi: 10.1016/j.csbj.2022.04.023. eCollection 2022.
With the development of next-generation sequencing technologies, single-cell RNA sequencing (scRNA-seq) has become one indispensable tool to reveal the wide heterogeneity between cells. Clustering is a fundamental task in this analysis to disclose the transcriptomic profiles of single cells and is one of the key computational problems that has received widespread attention. Recently, many clustering algorithms have been developed for the scRNA-seq data. Nevertheless, the computational models often suffer from realistic restrictions such as numerical instability, high dimensionality and computational scalability. Moreover, the accumulating cell numbers and high dropout rates bring a huge computational challenge to the analysis. To address these limitations, we first provide a systematic and extensive performance evaluation of four feature selection methods and nine scRNA-seq clustering algorithms on fourteen real single-cell RNA-seq datasets. Based on this, we then propose an accurate single-cell data analysis via Ensemble Feature Selection based Clustering, called scEFSC. Indeed, the algorithm employs several unsupervised feature selections to remove genes that do not contribute significantly to the scRNA-seq data. After that, different single-cell RNA-seq clustering algorithms are proposed to cluster the data filtered by multiple unsupervised feature selections, and then the clustering results are combined using weighted-based meta-clustering. We applied scEFSC to the fourteen real single-cell RNA-seq datasets and the experimental results demonstrated that our proposed scEFSC outperformed the other scRNA-seq clustering algorithms with several evaluation metrics. In addition, we established the biological interpretability of scEFSC by carrying out differential gene expression analysis, gene ontology enrichment and KEGG analysis. scEFSC is available at https://github.com/Conan-Bian/scEFSC.
随着下一代测序技术的发展,单细胞RNA测序(scRNA-seq)已成为揭示细胞间广泛异质性的一项不可或缺的工具。聚类是该分析中的一项基本任务,用于揭示单细胞的转录组概况,也是受到广泛关注的关键计算问题之一。最近,针对scRNA-seq数据开发了许多聚类算法。然而,计算模型常常受到诸如数值不稳定性、高维度和计算可扩展性等现实限制。此外,不断增加的细胞数量和高缺失率给分析带来了巨大的计算挑战。为了解决这些限制,我们首先对14个真实单细胞RNA-seq数据集上的四种特征选择方法和九种scRNA-seq聚类算法进行了系统而广泛的性能评估。在此基础上,我们随后提出了一种基于集成特征选择聚类的精确单细胞数据分析方法,称为scEFSC。实际上,该算法采用了几种无监督特征选择方法来去除对scRNA-seq数据贡献不大的基因。之后,提出了不同的单细胞RNA-seq聚类算法对经过多次无监督特征选择过滤的数据进行聚类,然后使用基于加权的元聚类方法将聚类结果合并。我们将scEFSC应用于14个真实单细胞RNA-seq数据集,实验结果表明,我们提出的scEFSC在多个评估指标上优于其他scRNA-seq聚类算法。此外,我们通过进行差异基因表达分析、基因本体富集分析和KEGG分析,确立了scEFSC的生物学可解释性。scEFSC可在https://github.com/Conan-Bian/scEFSC获取。