State Key Laboratory of Industrial Control Technology, Institute of Cyber-Systems and Control of Zhejiang University, Hangzhou, China.
Mol Inform. 2020 May;39(5):e1900075. doi: 10.1002/minf.201900075. Epub 2020 Jan 28.
Gene regulatory network Inference with high accuracy based on gene expression data sets is one of the most challenging problems in computational biology. To improve the accuracy of gene regulatory network inference and find hub genes, we proposed a novel model integration network inference method with clustering and hub genes finding called MINICHG. The method is divided into three main steps: (1) using single models inference results based on three machine learning algorithms to construct feature matrix; (2) using k-means to cluster gene pairs according to feature matrix; (3) hub genes finding. MINICHG integrates RF(Random Forest), GBDT (Gradient Boosting Decision Tree) and Pearson Correlation results with a novel weighted strategy in a semi-unsupervised way. The designed optimization scheme in MINICHG considering sparse gold standard data characteristics is suitable for most gene regulatory network reconstruction. We evaluated the proposed method on simulated data sets from five Dream4 multifactorial data sets and Dream5 in silico data set and real data set from E.coli. The performance was better than other network inference methods with high accuracy and robustness.
基于基因表达数据集的高精度基因调控网络推断是计算生物学中最具挑战性的问题之一。为了提高基因调控网络推断的准确性并找到枢纽基因,我们提出了一种新的模型集成网络推断方法,称为 MINICHG,该方法具有聚类和枢纽基因发现功能。该方法分为三个主要步骤:(1)使用三种机器学习算法的单个模型推断结果构建特征矩阵;(2)使用 k-means 根据特征矩阵对基因对进行聚类;(3)发现枢纽基因。MINICHG 以一种半监督的方式集成了 RF(随机森林)、GBDT(梯度提升决策树)和 Pearson 相关结果,并采用了一种新的加权策略。MINICHG 中考虑稀疏黄金标准数据特征的设计优化方案适用于大多数基因调控网络重建。我们在来自五个 Dream4 多因素数据集和 Dream5 虚拟数据集的模拟数据集以及来自大肠杆菌的真实数据集上评估了所提出的方法。该方法具有高精度和鲁棒性,优于其他网络推断方法。