Morozov Alexandre V, Siggia Eric D
Center for Studies in Physics and Biology, The Rockefeller University, 1230 York Avenue, New York, NY 10021, USA.
Proc Natl Acad Sci U S A. 2007 Apr 24;104(17):7068-73. doi: 10.1073/pnas.0701356104. Epub 2007 Apr 16.
A common task posed by microarray experiments is to infer the binding site preferences for a known transcription factor from a collection of genes that it regulates and to ascertain whether the factor acts alone or in a complex. The converse problem can also be posed: Given a collection of binding sites, can the regulatory factor or complex of factors be inferred? Both tasks are substantially facilitated by using relatively simple homology models for protein-DNA interactions, as well as the rapidly expanding protein structure database. For budding yeast, we are able to construct reliable structural models for 67 transcription factors and with them redetermine factor binding sites by using a Bayesian Gibbs sampling algorithm and an extensive protein localization data set. For 49 factors in common with a prior analysis of this data set (based largely on phylogenetic conservation), we find that half of the previously predicted binding motifs are in need of some revision. We also solve the inverse problem of ascertaining the factors from the binding sites by assigning a correct protein fold to 25 of the 49 cases from a previous study. Our approach is easily extended to other organisms, including higher eukaryotes. Our study highlights the utility of enlarging current structural genomics projects that exhaustively sample fold structure space to include all factors with significantly different DNA-binding specificities.
微阵列实验提出的一个常见任务是,从已知转录因子所调控的一组基因中推断出该转录因子的结合位点偏好,并确定该因子是单独起作用还是以复合物的形式起作用。反过来的问题也可以提出:给定一组结合位点,能否推断出调控因子或因子复合物?通过使用相对简单的蛋白质 - DNA 相互作用同源模型以及迅速扩展的蛋白质结构数据库,这两个任务都能得到极大的便利。对于芽殖酵母,我们能够为 67 个转录因子构建可靠的结构模型,并利用贝叶斯吉布斯采样算法和大量的蛋白质定位数据集,用这些模型重新确定因子结合位点。对于与该数据集先前分析(主要基于系统发育保守性)共有的 49 个因子,我们发现先前预测的结合基序中有一半需要进行一些修正。我们还通过为先前研究的 49 个案例中的 25 个案例分配正确的蛋白质折叠,解决了从结合位点确定因子的逆问题。我们的方法很容易扩展到其他生物体,包括高等真核生物。我们的研究强调了扩大当前结构基因组学项目的效用,这些项目详尽地对折叠结构空间进行采样,以纳入所有具有显著不同 DNA 结合特异性的因子。