Lopes Fabrício Martins, de Oliveira Evaldo A, Cesar Roberto M
Federal University of Technology - Paraná, Brazil.
BMC Syst Biol. 2011 May 5;5:61. doi: 10.1186/1752-0509-5-61.
The inference of gene regulatory networks (GRNs) from large-scale expression profiles is one of the most challenging problems of Systems Biology nowadays. Many techniques and models have been proposed for this task. However, it is not generally possible to recover the original topology with great accuracy, mainly due to the short time series data in face of the high complexity of the networks and the intrinsic noise of the expression measurements. In order to improve the accuracy of GRNs inference methods based on entropy (mutual information), a new criterion function is here proposed.
In this paper we introduce the use of generalized entropy proposed by Tsallis, for the inference of GRNs from time series expression profiles. The inference process is based on a feature selection approach and the conditional entropy is applied as criterion function. In order to assess the proposed methodology, the algorithm is applied to recover the network topology from temporal expressions generated by an artificial gene network (AGN) model as well as from the DREAM challenge. The adopted AGN is based on theoretical models of complex networks and its gene transference function is obtained from random drawing on the set of possible Boolean functions, thus creating its dynamics. On the other hand, DREAM time series data presents variation of network size and its topologies are based on real networks. The dynamics are generated by continuous differential equations with noise and perturbation. By adopting both data sources, it is possible to estimate the average quality of the inference with respect to different network topologies, transfer functions and network sizes.
A remarkable improvement of accuracy was observed in the experimental results by reducing the number of false connections in the inferred topology by the non-Shannon entropy. The obtained best free parameter of the Tsallis entropy was on average in the range 2.5 ≤ q ≤ 3.5 (hence, subextensive entropy), which opens new perspectives for GRNs inference methods based on information theory and for investigation of the nonextensivity of such networks. The inference algorithm and criterion function proposed here were implemented and included in the DimReduction software, which is freely available at http://sourceforge.net/projects/dimreduction and http://code.google.com/p/dimreduction/.
从大规模表达谱推断基因调控网络(GRNs)是当今系统生物学中最具挑战性的问题之一。针对此任务已提出了许多技术和模型。然而,通常不太可能非常准确地恢复原始拓扑结构,这主要是由于面对网络的高度复杂性和表达测量的固有噪声时时间序列数据较短。为了提高基于熵(互信息)的基因调控网络推断方法的准确性,本文提出了一种新的准则函数。
在本文中,我们引入了由Tsallis提出的广义熵,用于从时间序列表达谱推断基因调控网络。推断过程基于一种特征选择方法,并将条件熵用作准则函数。为了评估所提出的方法,该算法被应用于从人工基因网络(AGN)模型生成的时间序列表达以及DREAM挑战中恢复网络拓扑结构。所采用的AGN基于复杂网络的理论模型,其基因传递函数是从可能的布尔函数集合中随机抽取得到的,从而创建其动态特性。另一方面,DREAM时间序列数据呈现出网络大小的变化,其拓扑结构基于真实网络。动态特性是由带有噪声和扰动的连续微分方程生成的。通过采用这两种数据源,可以估计相对于不同网络拓扑结构、传递函数和网络大小的推断平均质量。
实验结果表明,通过非香农熵减少推断拓扑结构中的错误连接数量,准确性有了显著提高。所获得的Tsallis熵的最佳自由参数平均在2.5≤q≤3.5范围内(因此,是次广延熵),这为基于信息论的基因调控网络推断方法以及此类网络的非广延性研究开辟了新的视角。本文提出的推断算法和准则函数已实现并包含在DimReduction软件中,该软件可在http://sourceforge.net/projects/dimreduction和http://code.google.com/p/dimreduction/免费获取。