Bian Chuang, Wang Xubin, Su Yanchi, Wang Yunhe, Wong Ka-Chun, Li Xiangtao
School of Artificial Intelligence, Jilin University, Changchun, 130000, Jilin, China.
School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China.
Comput Struct Biotechnol J. 2022 Apr 27;20:2181-2197. doi: 10.1016/j.csbj.2022.04.023. eCollection 2022.
With the development of next-generation sequencing technologies, single-cell RNA sequencing (scRNA-seq) has become one indispensable tool to reveal the wide heterogeneity between cells. Clustering is a fundamental task in this analysis to disclose the transcriptomic profiles of single cells and is one of the key computational problems that has received widespread attention. Recently, many clustering algorithms have been developed for the scRNA-seq data. Nevertheless, the computational models often suffer from realistic restrictions such as numerical instability, high dimensionality and computational scalability. Moreover, the accumulating cell numbers and high dropout rates bring a huge computational challenge to the analysis. To address these limitations, we first provide a systematic and extensive performance evaluation of four feature selection methods and nine scRNA-seq clustering algorithms on fourteen real single-cell RNA-seq datasets. Based on this, we then propose an accurate single-cell data analysis via Ensemble Feature Selection based Clustering, called scEFSC. Indeed, the algorithm employs several unsupervised feature selections to remove genes that do not contribute significantly to the scRNA-seq data. After that, different single-cell RNA-seq clustering algorithms are proposed to cluster the data filtered by multiple unsupervised feature selections, and then the clustering results are combined using weighted-based meta-clustering. We applied scEFSC to the fourteen real single-cell RNA-seq datasets and the experimental results demonstrated that our proposed scEFSC outperformed the other scRNA-seq clustering algorithms with several evaluation metrics. In addition, we established the biological interpretability of scEFSC by carrying out differential gene expression analysis, gene ontology enrichment and KEGG analysis. scEFSC is available at https://github.com/Conan-Bian/scEFSC.
Comput Struct Biotechnol J. 2022-4-27
BMC Bioinformatics. 2019-12-24
PLoS Comput Biol. 2022-12
Brief Bioinform. 2022-3-10
IEEE/ACM Trans Comput Biol Bioinform. 2023
Brief Funct Genomics. 2023-7-17
BMC Bioinformatics. 2021-6-2
Life Sci Alliance. 2023-12
Brief Bioinform. 2021-9-2
Nat Commun. 2021-2-15
Front Genet. 2020-12-15
IEEE/ACM Trans Comput Biol Bioinform. 2021
BMC Bioinformatics. 2019-12-24
Nucleic Acids Res. 2020-1-10
IEEE/ACM Trans Comput Biol Bioinform. 2020