Yang Bin, Bao Wenzheng, Chen Baitong, Song Dan
School of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277160, China.
School of Information and Electrical Engineering, Xuzhou University of Technology, Xuzhou, 221018, China.
BioData Min. 2022 Jun 11;15(1):13. doi: 10.1186/s13040-022-00297-8.
Single-cell RNA-seq overcomes the shortcomings of conventional transcriptome sequencing technology and could provide a powerful tool for distinguishing the transcriptome characteristics of various cell types in biological tissues, and comprehensively revealing the heterogeneity of gene expression between cells. Many Intelligent Computing methods have been presented to infer gene regulatory network (GRN) with single-cell RNA-seq data. In this paper, we investigate the performances of seven classifiers including support vector machine (SVM), random forest (RF), Naive Bayesian (NB), GBDT, logical regression (LR), decision tree (DT) and K-Nearest Neighbor (KNN) for solving the binary classification problems of GRN inference with single-cell RNA-seq data (Single_cell_GRN). In SVM, three different kernel functions (linear, polynomial and radial basis function) are utilized, respectively. Three real single-cell RNA-seq datasets from mouse and human are utilized. The experiment results prove that in most cases supervised learning methods (SVM, RF, NB, GBDT, LR, DT and KNN) perform better than unsupervised learning method (GENIE3) in terms of AUC. SVM, RF and KNN have the better performances than other four classifiers. In SVM, linear and polynomial kernels are more fit to model single-cell RNA-seq data.
单细胞RNA测序克服了传统转录组测序技术的缺点,能够为区分生物组织中各种细胞类型的转录组特征提供强大工具,并全面揭示细胞间基因表达的异质性。已经提出了许多智能计算方法来利用单细胞RNA测序数据推断基因调控网络(GRN)。在本文中,我们研究了七种分类器的性能,包括支持向量机(SVM)、随机森林(RF)、朴素贝叶斯(NB)、梯度提升决策树(GBDT)、逻辑回归(LR)、决策树(DT)和K近邻(KNN),用于解决利用单细胞RNA测序数据进行GRN推断(单细胞_GRN)的二元分类问题。在支持向量机中,分别使用了三种不同的核函数(线性、多项式和径向基函数)。利用了来自小鼠和人类的三个真实单细胞RNA测序数据集。实验结果证明,在大多数情况下,监督学习方法(支持向量机、随机森林、朴素贝叶斯、梯度提升决策树、逻辑回归、决策树和K近邻)在AUC方面比无监督学习方法(GENIE3)表现更好。支持向量机、随机森林和K近邻比其他四个分类器表现更好。在支持向量机中,线性核和多项式核更适合对单细胞RNA测序数据进行建模。