Department of Biochemistry and Biophysics, Stockholm Bioinformatics Center, Science for Life Laboratory, Stockholm University, Stockholm, Sweden.
Department of Physics, Chemistry and Biology/Bioinformatics, Linköping University, Linköping, Sweden.
Bioinformatics. 2019 Mar 15;35(6):1026-1032. doi: 10.1093/bioinformatics/bty764.
Inference of gene regulatory networks (GRNs) from perturbation data can give detailed mechanistic insights of a biological system. Many inference methods exist, but the resulting GRN is generally sensitive to the choice of method-specific parameters. Even though the inferred GRN is optimal given the parameters, many links may be wrong or missing if the data is not informative. To make GRN inference reliable, a method is needed to estimate the support of each predicted link as the method parameters are varied.
To achieve this we have developed a method called nested bootstrapping, which applies a bootstrapping protocol to GRN inference, and by repeated bootstrap runs assesses the stability of the estimated support values. To translate bootstrap support values to false discovery rates we run the same pipeline with shuffled data as input. This provides a general method to control the false discovery rate of GRN inference that can be applied to any setting of inference parameters, noise level, or data properties. We evaluated nested bootstrapping on a simulated dataset spanning a range of such properties, using the LASSO, Least Squares, RNI, GENIE3 and CLR inference methods. An improved inference accuracy was observed in almost all situations. Nested bootstrapping was incorporated into the GeneSPIDER package, which was also used for generating the simulated networks and data, as well as running and analyzing the inferences.
https://bitbucket.org/sonnhammergrni/genespider/src/NB/%2B Methods/NestBoot.m.
从扰动数据推断基因调控网络 (GRN) 可以深入了解生物系统的机制。有许多推断方法,但所得的 GRN 通常对特定方法参数的选择敏感。即使在给定参数的情况下推断出的 GRN 是最优的,但如果数据没有信息量,许多链接可能是错误的或缺失的。为了使 GRN 推断可靠,需要一种方法来估计每个预测链接的支持,因为方法参数在变化。
为了实现这一目标,我们开发了一种称为嵌套自举的方法,该方法将自举协议应用于 GRN 推断,并通过重复自举运行评估估计支持值的稳定性。为了将自举支持值转换为错误发现率,我们使用随机化数据作为输入运行相同的管道。这提供了一种控制 GRN 推断错误发现率的通用方法,可以应用于任何推断参数、噪声水平或数据特性的设置。我们在跨越此类特性范围的模拟数据集上评估了嵌套自举,使用 LASSO、最小二乘法、RNI、GENIE3 和 CLR 推断方法。几乎在所有情况下都观察到推断准确性的提高。嵌套自举被纳入 GeneSPIDER 软件包中,该软件包还用于生成模拟网络和数据,以及运行和分析推断。
https://bitbucket.org/sonnhammergrni/genespider/src/NB/Methods/NestBoot.m.