Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, Melaka, Malaysia.
PLoS One. 2023 Oct 19;18(10):e0292961. doi: 10.1371/journal.pone.0292961. eCollection 2023.
Cell type identification is one of the fundamental tasks in single-cell RNA sequencing (scRNA-seq) studies. It is a key step to facilitate downstream interpretations such as differential expression, trajectory inference, etc. scRNA-seq data contains technical variations that could affect the interpretation of the cell types. Therefore, gene selection, also known as feature selection in data science, plays an important role in selecting informative genes for scRNA-seq cell type identification. Generally speaking, feature selection methods are categorized into filter-, wrapper-, and embedded-based approaches. From the existing literature, methods from filter- and embedded-based approaches are widely applied in scRNA-seq gene selection tasks. The wrapper-based method that gives promising results in other fields has yet been extensively utilized for selecting gene features from scRNA-seq data; in addition, most of the existing wrapper methods used in this field are clustering instead of classification-based. With a large number of annotated data available today, this study applied a classification-based approach as an alternative to the clustering-based wrapper method. In our work, a quantum-inspired differential evolution (QDE) wrapped with a classification method was introduced to select a subset of genes from twelve well-known scRNA-seq transcriptomic datasets to identify cell types. In particular, the QDE was combined with different machine-learning (ML) classifiers namely logistic regression, decision tree, support vector machine (SVM) with linear and radial basis function kernels, as well as extreme learning machine. The linear SVM wrapped with QDE, namely QDE-SVM, was chosen by referring to the feature selection results from the experiment. QDE-SVM showed a superior cell type classification performance among QDE wrapping with other ML classifiers as well as the recent wrapper methods (i.e., FSCAM, SSD-LAHC, MA-HS, and BSF). QDE-SVM achieved an average accuracy of 0.9559, while the other wrapper methods achieved average accuracies in the range of 0.8292 to 0.8872.
细胞类型鉴定是单细胞 RNA 测序 (scRNA-seq) 研究中的基本任务之一。它是促进下游解释(如差异表达、轨迹推断等)的关键步骤。scRNA-seq 数据包含可能影响细胞类型解释的技术变化。因此,基因选择(在数据科学中也称为特征选择)在为 scRNA-seq 细胞类型鉴定选择信息丰富的基因方面起着重要作用。一般来说,特征选择方法分为过滤型、包装型和嵌入式方法。从现有文献来看,过滤型和嵌入式方法的方法在 scRNA-seq 基因选择任务中得到了广泛应用。在其他领域中给出有希望结果的包装型方法尚未广泛用于从 scRNA-seq 数据中选择基因特征;此外,该领域中使用的大多数现有包装方法都是基于聚类而不是基于分类。有了大量现有的注释数据,本研究应用了一种基于分类的方法作为聚类包装方法的替代方法。在我们的工作中,引入了一种基于量子启发式差分进化 (QDE) 的分类方法,用于从 12 个著名的 scRNA-seq 转录组数据集选择一组基因,以识别细胞类型。特别是,QDE 与不同的机器学习 (ML) 分类器(即逻辑回归、决策树、带有线性和径向基函数核的支持向量机 (SVM) 以及极限学习机)相结合。通过参考实验中的特征选择结果,选择了 QDE 与其他 ML 分类器以及最近的包装方法(即 FSCAM、SSD-LAHC、MA-HS 和 BSF)包装的线性 SVM,即 QDE-SVM。QDE-SVM 与其他包装方法以及最近的包装方法(即 FSCAM、SSD-LAHC、MA-HS 和 BSF)相比,在细胞类型分类性能方面表现出优越的性能。QDE-SVM 的平均准确率为 0.9559,而其他包装方法的平均准确率在 0.8292 到 0.8872 之间。