Haury Anne-Claire, Mordelet Fantine, Vera-Licona Paola, Vert Jean-Philippe
Centre for Computational Biology, Mines ParisTech, Fontainebleau, F-77300 France.
BMC Syst Biol. 2012 Nov 22;6:145. doi: 10.1186/1752-0509-6-145.
Inferring the structure of gene regulatory networks (GRN) from a collection of gene expression data has many potential applications, from the elucidation of complex biological processes to the identification of potential drug targets. It is however a notoriously difficult problem, for which the many existing methods reach limited accuracy.
In this paper, we formulate GRN inference as a sparse regression problem and investigate the performance of a popular feature selection method, least angle regression (LARS) combined with stability selection, for that purpose. We introduce a novel, robust and accurate scoring technique for stability selection, which improves the performance of feature selection with LARS. The resulting method, which we call TIGRESS (for Trustful Inference of Gene REgulation with Stability Selection), was ranked among the top GRN inference methods in the DREAM5 gene network inference challenge. In particular, TIGRESS was evaluated to be the best linear regression-based method in the challenge. We investigate in depth the influence of the various parameters of the method, and show that a fine parameter tuning can lead to significant improvements and state-of-the-art performance for GRN inference, in both directed and undirected settings.
TIGRESS reaches state-of-the-art performance on benchmark data, including both in silico and in vivo (E. coli and S. cerevisiae) networks. This study confirms the potential of feature selection techniques for GRN inference. Code and data are available on http://cbio.ensmp.fr/tigress. Moreover, TIGRESS can be run online through the GenePattern platform (GP-DREAM, http://dream.broadinstitute.org).
从基因表达数据集中推断基因调控网络(GRN)的结构具有许多潜在应用,从阐明复杂的生物过程到识别潜在的药物靶点。然而,这是一个极其困难的问题,许多现有方法的准确性有限。
在本文中,我们将GRN推断表述为一个稀疏回归问题,并为此研究了一种流行的特征选择方法——最小角回归(LARS)与稳定性选择相结合的性能。我们为稳定性选择引入了一种新颖、稳健且准确的评分技术,该技术提高了LARS特征选择的性能。由此产生的方法,我们称之为TIGRESS(用于基于稳定性选择的基因调控可靠推断),在DREAM5基因网络推断挑战赛中位列顶级GRN推断方法。特别是,TIGRESS在挑战赛中被评估为最佳的基于线性回归的方法。我们深入研究了该方法各种参数的影响,并表明精细的参数调整可以显著提高GRN推断在有向和无向设置下的性能,并达到当前的最优水平。
TIGRESS在基准数据上达到了当前的最优性能,包括计算机模拟网络和体内(大肠杆菌和酿酒酵母)网络。这项研究证实了特征选择技术在GRN推断中的潜力。代码和数据可在http://cbio.ensmp.fr/tigress获取。此外,TIGRESS可以通过GenePattern平台(GP - DREAM,http://dream.broadinstitute.org)在线运行。