Liu R, Blackwell T W, States D J
Center for Computational Biology and Department of Genetics, Washington University School of Medicine, 700 S. Euclid Ave, St Louis, MO 63110, USA.
Bioinformatics. 2001 Jul;17(7):622-33. doi: 10.1093/bioinformatics/17.7.622.
Current methods for identifying sequence specific binding sites in DNA sequence using position specific weight matrices are limited in both sensitivity and specificity. Double strand DNA helix exhibits sequence dependent variations in conformation. Interactions between macromolecules result from complementarity of the two tertiary structures. We hypothesize that this conformational variation plays a role in transcription factor binding site recognition, and that the use of this structure information will improve the predictive power of transcription factor binding site models.
Conformation models for the sequence dependence of DNA helix distortion have been developed. Using our conformational models, we defined a tertiary structure template for the met operon repressor MetJ binding site. Both naturally occurring sites and precursor binding sites identified through in vitro selection were used as the basis for template definition. The conformational model appears to recognize features of protein binding sites that are distinct from the features recognized by primary sequence based profiles. Combining the conformational model and primary sequence profile yields a hybrid model with improved discriminatory power compared with either the conformational model or sequence profile alone. Using our hybrid model, we searched the E.coli genome. We are able to identify the documented MetJ sites in the promoter regions of metA, metB, metC, metR and metF. In addition, we find several novel loci with characteristics suggesting that they are functional MetJ repressor binding sites. Novel MetJ binding sites are found upstream of the metK gene, as well as upstream of a gene, abc, a gene that encodes for a component of a multifunction transporter which may transport amino acids across the membrane. The false positive rate is significantly lower than the sequence profile method.
The programs of implementation of this algorithm are available upon request. The list of crystal structures used for compiling the mean base step parameters of DNA is available by anonymous ftp at http://stateslab.wustl.edu/pub/helix/StructureList.
目前使用位置特异性权重矩阵在DNA序列中识别序列特异性结合位点的方法在灵敏度和特异性方面都存在局限性。双链DNA螺旋在构象上表现出序列依赖性变化。大分子之间的相互作用源于两种三级结构的互补性。我们假设这种构象变化在转录因子结合位点识别中起作用,并且利用这种结构信息将提高转录因子结合位点模型的预测能力。
已开发出DNA螺旋扭曲序列依赖性的构象模型。利用我们的构象模型,我们为甲硫氨酸操纵子阻遏蛋白MetJ结合位点定义了一个三级结构模板。天然存在的位点和通过体外筛选鉴定的前体结合位点都被用作模板定义的基础。构象模型似乎能够识别蛋白质结合位点的特征,这些特征与基于一级序列的图谱所识别的特征不同。与单独的构象模型或序列图谱相比,将构象模型和一级序列图谱相结合产生了一个具有更高鉴别能力的混合模型。使用我们的混合模型,我们搜索了大肠杆菌基因组。我们能够在metA、metB、metC、metR和metF的启动子区域中识别出已记录的MetJ位点。此外,我们发现了几个新的位点,其特征表明它们是功能性的MetJ阻遏蛋白结合位点。在metK基因上游以及一个名为abc的基因上游发现了新的MetJ结合位点,abc基因编码一种多功能转运蛋白的一个组分,该转运蛋白可能跨膜转运氨基酸。假阳性率明显低于序列图谱方法。
可根据要求提供该算法的实现程序。用于编制DNA平均碱基步长参数的晶体结构列表可通过匿名ftp在http://stateslab.wustl.edu/pub/helix/StructureList获得。