Kim Haseong, Lee Jae K, Park Taesung
Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, San 56-1, Shilim-dong, Korea.
J Bioinform Comput Biol. 2009 Aug;7(4):717-35. doi: 10.1142/s0219720009004278.
The gene regulatory network modeling plays a key role in search for relationships among genes. Many modeling approaches have been introduced to find the causal relationship between genes using time series microarray data. However, they have been suffering from high dimensionality, overfitting, and heavy computation time. Further, the selection of a best model among several possible competing models is not guaranteed that it is the best one. In this study, we propose a simple procedure for constructing large scale gene regulatory networks using a regression-based network approach. We determine the optimal out-degree of network structure by using the sum of squared coefficients which are obtained from all appropriate regression models. Through the simulated data, accuracy of estimation and robustness against noise are computed in order to compare with the vector autoregressive regression model. Our method shows high accuracy and robustness for inferring large-scale gene networks. Also it is applied to Caulobacter crescentus cell cycle data consisting of 1472 genes. It shows that many genes are regulated by two transcription factors, ctrA and gcrA, that are known for global regulators.
基因调控网络建模在寻找基因间关系方面起着关键作用。已经引入了许多建模方法,利用时间序列微阵列数据来寻找基因之间的因果关系。然而,它们一直面临高维性、过拟合和计算时间过长的问题。此外,在几个可能相互竞争的模型中选择最佳模型并不能保证它就是最好的那个。在本研究中,我们提出了一种使用基于回归的网络方法构建大规模基因调控网络的简单程序。我们通过使用从所有合适的回归模型中获得的平方系数之和来确定网络结构的最佳出度。通过模拟数据,计算估计的准确性和对噪声的鲁棒性,以便与向量自回归回归模型进行比较。我们的方法在推断大规模基因网络方面显示出高准确性和鲁棒性。它还应用于由1472个基因组成的新月柄杆菌细胞周期数据。结果表明,许多基因受两个作为全局调节因子而闻名的转录因子ctrA和gcrA调控。