Deng Xutao, Geng Huimin, Ali Hesham
Department of Computer Science, College of Information Science and Technology, Peter Kiewit Institute 378, University of Nebraska at Omaha, Omaha, NE 68182-0116, USA.
Biosystems. 2005 Aug;81(2):125-36. doi: 10.1016/j.biosystems.2005.02.007.
Reverse-engineering of gene networks using linear models often results in an underdetermined system because of excessive unknown parameters. In addition, the practical utility of linear models has remained unclear. We address these problems by developing an improved method, EXpression Array MINing Engine (EXAMINE), to infer gene regulatory networks from time-series gene expression data sets. EXAMINE takes advantage of sparse graph theory to overcome the excessive-parameter problem with an adaptive-connectivity model and fitting algorithm. EXAMINE also guarantees that the most parsimonious network structure will be found with its incremental adaptive fitting process. Compared to previous linear models, where a fully connected model is used, EXAMINE reduces the number of parameters by O(N), thereby increasing the chance of recovering the underlying regulatory network. The fitting algorithm increments the connectivity during the fitting process until a satisfactory fit is obtained. We performed a systematic study to explore the data mining ability of linear models. A guideline for using linear models is provided: If the system is small (3-20 elements), more than 90% of the regulation pathways can be determined correctly. For a large-scale system, either clustering is needed or it is necessary to integrate information in addition to expression profile. Coupled with the clustering method, we applied EXAMINE to rat central nervous system development (CNS) data with 112 genes. We were able to efficiently generate regulatory networks with statistically significant pathways that have been predicted previously.
使用线性模型对基因网络进行逆向工程往往会因为未知参数过多而导致系统欠定。此外,线性模型的实际效用仍不明确。我们通过开发一种改进的方法——表达阵列挖掘引擎(EXAMINE)来解决这些问题,该方法用于从时间序列基因表达数据集中推断基因调控网络。EXAMINE利用稀疏图理论,通过自适应连通性模型和拟合算法来克服参数过多的问题。EXAMINE还通过其增量自适应拟合过程保证能找到最简约的网络结构。与之前使用全连接模型的线性模型相比,EXAMINE将参数数量减少了O(N),从而增加了恢复潜在调控网络的机会。拟合算法在拟合过程中增加连通性,直到获得满意的拟合效果。我们进行了一项系统研究来探索线性模型的数据挖掘能力。提供了一个使用线性模型的指南:如果系统较小(3 - 20个元素),超过90%的调控途径可以被正确确定。对于大规模系统,要么需要聚类,要么除了表达谱之外还需要整合信息。结合聚类方法,我们将EXAMINE应用于具有112个基因的大鼠中枢神经系统发育(CNS)数据。我们能够高效地生成具有先前预测的具有统计学意义途径的调控网络。