Sabatti Chiara, James Gareth M
Department of Human Genetics, UCLA, Los Angeles, CA 90095-7088, USA.
Bioinformatics. 2006 Mar 15;22(6):739-46. doi: 10.1093/bioinformatics/btk017. Epub 2005 Dec 20.
In systems like Escherichia Coli, the abundance of sequence information, gene expression array studies and small scale experiments allows one to reconstruct the regulatory network and to quantify the effects of transcription factors on gene expression. However, this goal can only be achieved if all information sources are used in concert.
Our method integrates literature information, DNA sequences and expression arrays. A set of relevant transcription factors is defined on the basis of literature. Sequence data are used to identify potential target genes and the results are used to define a prior distribution on the topology of the regulatory network. A Bayesian hidden component model for the expression array data allows us to identify which of the potential binding sites are actually used by the regulatory proteins in the studied cell conditions, the strength of their control, and their activation profile in a series of experiments. We apply our methodology to 35 expression studies in E.Coli with convincing results.
www.genetics.ucla.edu/labs/sabatti/software.html
The supplementary material are available at Bioinformatics online.
在大肠杆菌等系统中,丰富的序列信息、基因表达阵列研究和小规模实验使人们能够重建调控网络并量化转录因子对基因表达的影响。然而,只有当所有信息源协同使用时,才能实现这一目标。
我们的方法整合了文献信息、DNA序列和表达阵列。基于文献定义了一组相关的转录因子。序列数据用于识别潜在的靶基因,其结果用于定义调控网络拓扑结构的先验分布。针对表达阵列数据的贝叶斯隐藏成分模型使我们能够确定在研究的细胞条件下哪些潜在的结合位点实际上被调控蛋白所使用、它们的控制强度以及在一系列实验中的激活模式。我们将我们的方法应用于大肠杆菌的35项表达研究,结果令人信服。
www.genetics.ucla.edu/labs/sabatti/software.html
补充材料可在《生物信息学》在线获取。