Sayyed-Ahmad Abdallah, Tuncay Kagan, Ortoleva Peter J
Center for Cell and Virus Theory, Department of Chemistry, Indiana University, Bloomington, IN 47405, USA.
BMC Bioinformatics. 2007 Jan 23;8:20. doi: 10.1186/1471-2105-8-20.
Gene expression microarray and other multiplex data hold promise for addressing the challenges of cellular complexity, refined diagnoses and the discovery of well-targeted treatments. A new approach to the construction and quantification of transcriptional regulatory networks (TRNs) is presented that integrates gene expression microarray data and cell modeling through information theory. Given a partial TRN and time series data, a probability density is constructed that is a functional of the time course of transcription factor (TF) thermodynamic activities at the site of gene control, and is a function of mRNA degradation and transcription rate coefficients, and equilibrium constants for TF/gene binding.
Our approach yields more physicochemical information that compliments the results of network structure delineation methods, and thereby can serve as an element of a comprehensive TRN discovery/quantification system. The most probable TF time courses and values of the aforementioned parameters are obtained by maximizing the probability obtained through entropy maximization. Observed time delays between mRNA expression and activity are accounted for implicitly since the time course of the activity of a TF is coupled by probability functional maximization, and is not assumed to be proportional to expression level of the mRNA type that translates into the TF. This allows one to investigate post-translational and TF activation mechanisms of gene regulation. Accuracy and robustness of the method are evaluated. A kinetic formulation is used to facilitate the analysis of phenomena with a strongly dynamical character while a physically-motivated regularization of the TF time course is found to overcome difficulties due to omnipresent noise and data sparsity that plague other methods of gene expression data analysis. An application to Escherichia coli is presented.
Multiplex time series data can be used for the construction of the network of cellular processes and the calibration of the associated physicochemical parameters. We have demonstrated these concepts in the context of gene regulation understood through the analysis of gene expression microarray time series data. Casting the approach in a probabilistic framework has allowed us to address the uncertainties in gene expression microarray data. Our approach was found to be robust to error in the gene expression microarray data and mistakes in a proposed TRN.
基因表达微阵列和其他多重数据有望应对细胞复杂性、精准诊断以及发现精准靶向治疗等挑战。本文提出了一种构建和量化转录调控网络(TRN)的新方法,该方法通过信息论整合基因表达微阵列数据和细胞模型。给定一个部分TRN和时间序列数据,构建一个概率密度,它是基因控制位点处转录因子(TF)热力学活性随时间变化过程的函数,也是mRNA降解、转录速率系数以及TF/基因结合平衡常数的函数。
我们的方法产生了更多的物理化学信息,补充了网络结构描绘方法的结果,因此可以作为综合TRN发现/量化系统的一个要素。通过最大化熵最大化得到的概率,可获得最可能的TF时间过程以及上述参数的值。由于TF活性的时间过程通过概率泛函最大化耦合,且不假定其与翻译成TF的mRNA类型的表达水平成比例,因此隐式地考虑了观察到的mRNA表达与活性之间的时间延迟。这使得人们能够研究基因调控的翻译后和TF激活机制。评估了该方法的准确性和稳健性。使用动力学公式来促进对具有强动态特征现象的分析,同时发现对TF时间过程进行基于物理的正则化可以克服困扰其他基因表达数据分析方法的普遍噪声和数据稀疏问题。展示了该方法在大肠杆菌中的应用。
多重时间序列数据可用于构建细胞过程网络并校准相关的物理化学参数。我们通过对基因表达微阵列时间序列数据的分析,在基因调控的背景下证明了这些概念。将该方法置于概率框架中使我们能够解决基因表达微阵列数据中的不确定性。我们发现该方法对基因表达微阵列数据中的误差和所提出的TRN中的错误具有稳健性。