Suppr超能文献

NIMEFI:使用多种集成特征重要性算法进行基因调控网络推断

NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.

作者信息

Ruyssinck Joeri, Huynh-Thu Vân Anh, Geurts Pierre, Dhaene Tom, Demeester Piet, Saeys Yvan

机构信息

Department of Information Technology, Ghent University - iMinds, Gent, Belgium.

Department of Electrical Engineering and Computer Science & GIGA-R, Systems and Modeling, University of Liège, Liège, Belgium; School of Informatics, University of Edinburgh, Edinburgh, United Kingdom.

出版信息

PLoS One. 2014 Mar 25;9(3):e92709. doi: 10.1371/journal.pone.0092709. eCollection 2014.

Abstract

One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available.

摘要

计算系统生物学中长期存在的一个公开挑战是从高通量组学数据推断基因调控网络的拓扑结构。最近,已经开展了两项全社区范围的工作,即DREAM4和DREAM5,以使用基因表达测量对网络推断技术进行基准测试。在这些挑战中,总体表现最佳的是GENIE3算法。该方法将网络推断任务分解为针对网络中每个基因的单独回归问题,其中使用所有其他基因作为可能的预测因子来预测特定目标基因的表达值。接下来,使用基于树的集成方法,计算每个预测基因相对于目标基因的重要性度量,并且高特征重要性被视为两个基因之间存在调控联系的推定证据。这项工作的贡献有两个方面。首先,我们将GENIE3的回归分解策略推广到其他特征重要性方法。我们将支持向量回归、弹性网络、随机森林回归、符号回归及其集成变体在这种情况下的性能与原始GENIE3算法进行比较。为了创建集成变体,我们提出了一种子采样方法,该方法允许我们将任何产生特征排名的特征选择算法转换为集成特征重要性算法。我们证明集成设置是网络推断任务的关键,因为只有集成变体才能实现最佳性能。作为第二个贡献,我们探索了使用多种集成算法的按排名平均预测而不是仅使用一种预测的效果。我们将这种方法命名为NIMEFI(使用多种集成特征重要性算法的网络推断),并表明这种方法通常优于所有单独的方法,尽管在特定网络上单个方法可能表现更好。NIMEFI的一个实现已公开可用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2fbd/3965471/248160191012/pone.0092709.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验