Chemmangattuvalappil Nishanth, Task Keith, Banerjee Ipsita
Department of Chemical Engineering, University of Pittsburgh, 1249 Benedum Hall, 3700 O'Hara Street, Pittsburgh, PA 15261, USA.
BMC Syst Biol. 2012 Sep 2;6:119. doi: 10.1186/1752-0509-6-119.
Reverse engineering gene networks and identifying regulatory interactions are integral to understanding cellular decision making processes. Advancement in high throughput experimental techniques has initiated innovative data driven analysis of gene regulatory networks. However, inherent noise associated with biological systems requires numerous experimental replicates for reliable conclusions. Furthermore, evidence of robust algorithms directly exploiting basic biological traits are few. Such algorithms are expected to be efficient in their performance and robust in their prediction.
We have developed a network identification algorithm to accurately infer both the topology and strength of regulatory interactions from time series gene expression data in the presence of significant experimental noise and non-linear behavior. In this novel formulism, we have addressed data variability in biological systems by integrating network identification with the bootstrap resampling technique, hence predicting robust interactions from limited experimental replicates subjected to noise. Furthermore, we have incorporated non-linearity in gene dynamics using the S-system formulation. The basic network identification formulation exploits the trait of sparsity of biological interactions. Towards that, the identification algorithm is formulated as an integer-programming problem by introducing binary variables for each network component. The objective function is targeted to minimize the network connections subjected to the constraint of maximal agreement between the experimental and predicted gene dynamics. The developed algorithm is validated using both in silico and experimental data-sets. These studies show that the algorithm can accurately predict the topology and connection strength of the in silico networks, as quantified by high precision and recall, and small discrepancy between the actual and predicted kinetic parameters. Furthermore, in both the in silico and experimental case studies, the predicted gene expression profiles are in very close agreement with the dynamics of the input data.
Our integer programming algorithm effectively utilizes bootstrapping to identify robust gene regulatory networks from noisy, non-linear time-series gene expression data. With significant noise and non-linearities being inherent to biological systems, the present formulism, with the incorporation of network sparsity, is extremely relevant to gene regulatory networks, and while the formulation has been validated against in silico and E. Coli data, it can be applied to any biological system.
逆向工程基因网络并识别调控相互作用是理解细胞决策过程的重要组成部分。高通量实验技术的进步引发了对基因调控网络的创新数据驱动分析。然而,与生物系统相关的固有噪声需要大量实验重复才能得出可靠结论。此外,直接利用基本生物学特性的稳健算法的证据很少。这种算法预计在性能上高效且在预测上稳健。
我们开发了一种网络识别算法,能够在存在显著实验噪声和非线性行为的情况下,从时间序列基因表达数据中准确推断调控相互作用的拓扑结构和强度。在这种新颖的公式中,我们通过将网络识别与自助重采样技术相结合来解决生物系统中的数据变异性问题,从而从受噪声影响的有限实验重复中预测稳健的相互作用。此外,我们使用S系统公式将基因动态中的非线性纳入其中。基本的网络识别公式利用了生物相互作用的稀疏性特征。为此,通过为每个网络组件引入二元变量,将识别算法表述为一个整数规划问题。目标函数旨在在实验和预测基因动态之间最大程度一致的约束下,最小化网络连接。所开发的算法使用计算机模拟和实验数据集进行了验证。这些研究表明,该算法能够准确预测计算机模拟网络的拓扑结构和连接强度,通过高精度和召回率以及实际和预测动力学参数之间的小差异来量化。此外,在计算机模拟和实验案例研究中,预测的基因表达谱与输入数据的动态非常吻合。
我们的整数规划算法有效地利用自助法从有噪声的、非线性的时间序列基因表达数据中识别稳健的基因调控网络。由于生物系统中存在显著的噪声和非线性,当前结合网络稀疏性的公式对于基因调控网络极为相关,并且虽然该公式已针对计算机模拟和大肠杆菌数据进行了验证,但它可应用于任何生物系统。