https://ror.org/05td3s095 College of Artificial Intelligence, Nanjing Agricultural University, Nanjing, China.
https://ror.org/05td3s095 Center for Data Science and Intelligent Computing, Nanjing Agricultural University, Nanjing, China.
Life Sci Alliance. 2023 Oct 3;6(12). doi: 10.26508/lsa.202302103. Print 2023 Dec.
Single-cell RNA sequencing (scRNA-seq) enables researchers to reveal previously unknown cell heterogeneity and functional diversity, which is impossible with bulk RNA sequencing. Clustering approaches are widely used for analyzing scRNA-seq data and identifying cell types and states. In the past few years, various advanced computational strategies emerged. However, the low generalization and high computational cost are the main bottlenecks of existing methods. In this study, we established a novel computational framework, scFseCluster, for scRNA-seq clustering analysis. scFseCluster incorporates a metaheuristic algorithm (Feature Selection based on Quantum Squirrel Search Algorithm) to extract the optimal gene set, which largely guarantees the performance of cell clustering. We conducted simulation experiments in several aspects to verify the performance of the proposed approach. scFseCluster performed very well on eight benchmark scRNA-seq datasets because of the optimal gene sets obtained using the Feature Selection based on Quantum Squirrel Search Algorithm. The comparative study demonstrated the significant advantages of scFseCluster over seven State-of-the-Art algorithms. In addition, our analysis shows that feature selection on high-variable genes can significantly improve clustering performance. In conclusion, our study demonstrates that scFseCluster is a highly versatile tool for enhancing scRNA-seq data clustering analysis.
单细胞 RNA 测序 (scRNA-seq) 使研究人员能够揭示以前未知的细胞异质性和功能多样性,而这是批量 RNA 测序无法实现的。聚类方法广泛用于分析 scRNA-seq 数据,以识别细胞类型和状态。在过去的几年中,出现了各种先进的计算策略。然而,低泛化和高计算成本是现有方法的主要瓶颈。在这项研究中,我们建立了一个新的计算框架 scFseCluster,用于 scRNA-seq 聚类分析。scFseCluster 结合了一种启发式算法(基于量子松鼠搜索算法的特征选择)来提取最优基因集,这在很大程度上保证了细胞聚类的性能。我们通过几个方面的模拟实验来验证所提出方法的性能。由于使用基于量子松鼠搜索算法的特征选择获得了最优基因集,scFseCluster 在八个基准 scRNA-seq 数据集上的表现非常出色。对比研究表明,scFseCluster 优于七种最先进的算法。此外,我们的分析表明,对高变量基因进行特征选择可以显著提高聚类性能。总之,我们的研究表明,scFseCluster 是一种增强 scRNA-seq 数据聚类分析的多功能工具。