Yang Bin, Li Jing, Li Xiang, Liu Sanrong
School of Information Science and Engineering, Zaozhuang University, No. 1 Beian Road, Zaozhuang 277160, China.
Information Department, Qingdao Eighth People's Hospital, No. 84 Fengshan Road, Qingdao 266121, China.
Brief Funct Genomics. 2024 Dec 6;23(6):866-878. doi: 10.1093/bfgp/elae036.
Gene regulatory networks (GRNs) contribute toward understanding the function of genes and the development of cancer or the impact of key genes on diseases. Hence, this study proposes an ensemble method based on 13 basic classification methods and a flexible neural tree (FNT) to improve GRN identification accuracy. The primary classification methods contain ridge classification, stochastic gradient descent, Gaussian process classification, Bernoulli Naive Bayes, adaptive boosting, gradient boosting decision tree, hist gradient boosting classification, eXtreme gradient boosting (XGBoost), multilayer perceptron, light gradient boosting machine, random forest, support vector machine, and k-nearest neighbor algorithm, which are regarded as the input variable set of FNT model. Additionally, a hybrid evolutionary algorithm based on a gene programming variant and particle swarm optimization is developed to search for the optimal FNT model. Experiments on three simulation datasets and three real single-cell RNA-seq datasets demonstrate that the proposed ensemble feature outperforms 13 supervised algorithms, seven unsupervised algorithms (ARACNE, CLR, GENIE3, MRNET, PCACMI, GENECI, and EPCACMI) and four single cell-specific methods (SCODE, BiRGRN, LEAP, and BiGBoost) based on the area under the receiver operating characteristic curve, area under the precision-recall curve, and F1 metrics.
基因调控网络(GRNs)有助于理解基因功能、癌症发展或关键基因对疾病的影响。因此,本研究提出了一种基于13种基本分类方法和灵活神经树(FNT)的集成方法,以提高GRN识别准确率。主要分类方法包括岭分类、随机梯度下降、高斯过程分类、伯努利朴素贝叶斯、自适应提升、梯度提升决策树、直方图梯度提升分类、极端梯度提升(XGBoost)、多层感知器、轻梯度提升机、随机森林、支持向量机和k近邻算法,这些被视为FNT模型的输入变量集。此外,还开发了一种基于基因编程变体和粒子群优化的混合进化算法,以搜索最优的FNT模型。在三个模拟数据集和三个真实单细胞RNA测序数据集上的实验表明,基于受试者工作特征曲线下面积、精确率-召回率曲线下面积和F1指标,所提出的集成特征优于13种监督算法、七种无监督算法(ARACNE、CLR、GENIE3、MRNET、PCACMI、GENECI和EPCACMI)以及四种单细胞特异性方法(SCODE、BiRGRN、LEAP和BiGBoost)。