Steinke Florian, Seeger Matthias, Tsuda Koji
Max Planck Institute for Biological Cybernetics, Spemannstr, 38, 72076 Tübingen, Germany.
BMC Syst Biol. 2007 Nov 16;1:51. doi: 10.1186/1752-0509-1-51.
Identifying large gene regulatory networks is an important task, while the acquisition of data through perturbation experiments (e.g., gene switches, RNAi, heterozygotes) is expensive. It is thus desirable to use an identification method that effectively incorporates available prior knowledge - such as sparse connectivity - and that allows to design experiments such that maximal information is gained from each one.
Our main contributions are twofold: a method for consistent inference of network structure is provided, incorporating prior knowledge about sparse connectivity. The algorithm is time efficient and robust to violations of model assumptions. Moreover, we show how to use it for optimal experimental design, reducing the number of required experiments substantially. We employ sparse linear models, and show how to perform full Bayesian inference for these. We not only estimate a single maximum likelihood network, but compute a posterior distribution over networks, using a novel variant of the expectation propagation method. The representation of uncertainty enables us to do effective experimental design in a standard statistical setting: experiments are selected such that the experiments are maximally informative.
Few methods have addressed the design issue so far. Compared to the most well-known one, our method is more transparent, and is shown to perform qualitatively superior. In the former, hard and unrealistic constraints have to be placed on the network structure for mere computational tractability, while such are not required in our method. We demonstrate reconstruction and optimal experimental design capabilities on tasks generated from realistic non-linear network simulators. The methods described in the paper are available as a Matlab package athttp://www.kyb.tuebingen.mpg.de/sparselinearmodel.
识别大型基因调控网络是一项重要任务,而通过扰动实验(如基因开关、RNA干扰、杂合子)获取数据成本高昂。因此,期望使用一种能有效整合可用先验知识(如稀疏连接性)的识别方法,并能设计实验以便从每个实验中获取最大信息。
我们的主要贡献有两方面:提供了一种用于一致推断网络结构的方法,该方法整合了关于稀疏连接性的先验知识。该算法效率高且对模型假设的违反具有鲁棒性。此外,我们展示了如何将其用于最优实验设计,大幅减少所需实验的数量。我们采用稀疏线性模型,并展示了如何对其进行全贝叶斯推断。我们不仅估计单个最大似然网络,还使用期望传播方法的一种新颖变体计算网络的后验分布。不确定性的表示使我们能够在标准统计设置中进行有效的实验设计:选择实验以使实验具有最大信息量。
到目前为止,很少有方法解决设计问题。与最著名的方法相比,我们的方法更透明,并且在定性上表现更优。在前者中,为了计算的易处理性,必须对网络结构施加严格且不现实的约束,而我们的方法则不需要。我们在由现实非线性网络模拟器生成的任务上展示了重建和最优实验设计能力。本文所述方法可作为Matlab包从http://www.kyb.tuebingen.mpg.de/sparselinearmodel获取。