Tjärnberg Andreas, Nordling Torbjörn E M, Studham Matthew, Sonnhammer Erik L L
Stockholm Bioinformatics Center, Science for Life Laboratory, Stockholm, Sweden.
J Comput Biol. 2013 May;20(5):398-408. doi: 10.1089/cmb.2012.0268.
Gene regulatory network inference (that is, determination of the regulatory interactions between a set of genes) provides mechanistic insights of central importance to research in systems biology. Most contemporary network inference methods rely on a sparsity/regularization coefficient, which we call ζ (zeta), to determine the degree of sparsity of the network estimates, that is, the total number of links between the nodes. However, they offer little or no advice on how to select this sparsity coefficient, in particular, for biological data with few samples. We show that an empty network is more accurate than estimates obtained for a poor choice of ζ. In order to avoid such poor choices, we propose a method for optimization of ζ, which maximizes the accuracy of the inferred network for any sparsity-dependent inference method and data set. Our procedure is based on leave-one-out cross-optimization and selection of the ζ value that minimizes the prediction error. We also illustrate the adverse effects of noise, few samples, and uninformative experiments on network inference as well as our method for optimization of ζ. We demonstrate that our ζ optimization method for two widely used inference algorithms--Glmnet and NIR--gives accurate and informative estimates of the network structure, given that the data is informative enough.
基因调控网络推断(即确定一组基因之间的调控相互作用)为系统生物学研究提供了至关重要的机制性见解。大多数当代网络推断方法依赖于一个稀疏性/正则化系数,我们称之为ζ(泽塔),来确定网络估计的稀疏程度,即节点之间的连接总数。然而,它们对于如何选择这个稀疏系数几乎没有提供建议,特别是对于样本较少的生物学数据。我们表明,一个空网络比因ζ选择不当而获得的估计更准确。为了避免这种不当选择,我们提出了一种优化ζ的方法,该方法能使任何依赖稀疏性的推断方法和数据集的推断网络准确性最大化。我们的程序基于留一法交叉优化,并选择使预测误差最小的ζ值。我们还阐述了噪声、样本少和无信息实验对网络推断的不利影响以及我们的ζ优化方法。我们证明,对于两种广泛使用的推断算法——Glmnet和NIR——我们的ζ优化方法在数据足够有信息的情况下,能给出准确且有信息的网络结构估计。