Marshall Stephen, Yu Le, Xiao Yufei, Dougherty Edward R
Department of Electronic and Electrical Engineering, Faculty of Engineering, University of Strathclyde, Glasgow, UK.
EURASIP J Bioinform Syst Biol. 2007;2007(1):32454. doi: 10.1155/2007/32454.
The inference of gene regulatory networks is a key issue for genomic signal processing. This paper addresses the inference of probabilistic Boolean networks (PBNs) from observed temporal sequences of network states. Since a PBN is composed of a finite number of Boolean networks, a basic observation is that the characteristics of a single Boolean network without perturbation may be determined by its pairwise transitions. Because the network function is fixed and there are no perturbations, a given state will always be followed by a unique state at the succeeding time point. Thus, a transition counting matrix compiled over a data sequence will be sparse and contain only one entry per line. If the network also has perturbations, with small perturbation probability, then the transition counting matrix would have some insignificant nonzero entries replacing some (or all) of the zeros. If a data sequence is sufficiently long to adequately populate the matrix, then determination of the functions and inputs underlying the model is straightforward. The difficulty comes when the transition counting matrix consists of data derived from more than one Boolean network. We address the PBN inference procedure in several steps: (1) separate the data sequence into "pure" subsequences corresponding to constituent Boolean networks; (2) given a subsequence, infer a Boolean network; and (3) infer the probabilities of perturbation, the probability of there being a switch between constituent Boolean networks, and the selection probabilities governing which network is to be selected given a switch. Capturing the full dynamic behavior of probabilistic Boolean networks, be they binary or multivalued, will require the use of temporal data, and a great deal of it. This should not be surprising given the complexity of the model and the number of parameters, both transitional and static, that must be estimated. In addition to providing an inference algorithm, this paper demonstrates that the data requirement is much smaller if one does not wish to infer the switching, perturbation, and selection probabilities, and that constituent-network connectivity can be discovered with decent accuracy for relatively small time-course sequences.
基因调控网络的推断是基因组信号处理中的一个关键问题。本文探讨了从观察到的网络状态时间序列推断概率布尔网络(PBN)的问题。由于PBN由有限数量的布尔网络组成,一个基本的观察结果是,无扰动的单个布尔网络的特征可能由其两两之间的转换来确定。因为网络函数是固定的且没有扰动,给定的状态在后续时间点总是会跟随一个唯一的状态。因此,在数据序列上编译的转换计数矩阵将是稀疏的,每行只包含一个条目。如果网络也存在扰动,且扰动概率较小,那么转换计数矩阵会有一些无足轻重的非零条目取代一些(或全部)零条目。如果数据序列足够长以充分填充矩阵,那么确定模型背后的函数和输入就很简单。困难在于转换计数矩阵由来自多个布尔网络的数据组成时。我们分几步处理PBN推断过程:(1)将数据序列分离为对应于组成布尔网络的“纯”子序列;(2)给定一个子序列,推断一个布尔网络;(3)推断扰动概率、组成布尔网络之间切换的概率以及给定切换时决定选择哪个网络的选择概率。要捕捉概率布尔网络的完整动态行为,无论其是二进制还是多值的,都需要使用时间数据,而且需要大量的时间数据。考虑到模型的复杂性以及必须估计的过渡和静态参数的数量,这并不奇怪。除了提供一种推断算法外,本文还表明,如果不希望推断切换、扰动和选择概率,数据需求会小得多,并且对于相对较短的时间进程序列,可以以相当高的精度发现组成网络的连通性。