Perrin Bruno-Edouard, Ralaivola Liva, Mazurie Aurélien, Bottani Samuele, Mallet Jacques, d'Alché-Buc Florence
Laboratoire d'Informatique de Paris 6, CNRS UMR 7606, Paris, France.
Bioinformatics. 2003 Oct;19 Suppl 2:ii138-48. doi: 10.1093/bioinformatics/btg1071.
This article deals with the identification of gene regulatory networks from experimental data using a statistical machine learning approach. A stochastic model of gene interactions capable of handling missing variables is proposed. It can be described as a dynamic Bayesian network particularly well suited to tackle the stochastic nature of gene regulation and gene expression measurement. Parameters of the model are learned through a penalized likelihood maximization implemented through an extended version of EM algorithm. Our approach is tested against experimental data relative to the S.O.S. DNA Repair network of the Escherichia coli bacterium. It appears to be able to extract the main regulations between the genes involved in this network. An added missing variable is found to model the main protein of the network. Good prediction abilities on unlearned data are observed. These first results are very promising: they show the power of the learning algorithm and the ability of the model to capture gene interactions.
本文运用统计机器学习方法,从实验数据中识别基因调控网络。提出了一种能够处理缺失变量的基因相互作用随机模型。它可被描述为一个动态贝叶斯网络,特别适合处理基因调控和基因表达测量的随机性。通过扩展版的期望最大化算法实现的惩罚似然最大化来学习模型参数。我们的方法针对大肠杆菌S.O.S. DNA修复网络的实验数据进行了测试。结果表明它似乎能够提取该网络中相关基因之间的主要调控关系。发现添加一个缺失变量可以对该网络的主要蛋白质进行建模。在未学习的数据上观察到了良好的预测能力。这些初步结果非常有前景:它们展示了学习算法的强大功能以及模型捕捉基因相互作用的能力。