Brown C T, Callan C G
Division of Biology, California Institute of Technology, Pasadena, CA 91125, USA.
Proc Natl Acad Sci U S A. 2004 Feb 24;101(8):2404-9. doi: 10.1073/pnas.0308628100.
The cAMP response protein (CRP) is a transcription factor known to regulate many genes in Escherichia coli. Computational studies of transcription factor binding to DNA are usually based on a simple matrix model of sequence-dependent binding energy. For CRP, this model predicts many binding sites that are not known to be functional. If they are indeed spurious, the underlying binding model is called into question. We use a species comparison method to assess the functionality of a population of such predicted CRP sites in E. coli. We compare them with orthologous sites in Salmonella typhimurium identified independently by CLUSTALW alignment, and find a dependence of mutation probability on position in the site. This dependence increases with predicted site binding energy. The positions where mutation is most strongly suppressed are those where mutation would have the biggest effect on predicted binding energy. This finding suggests that many of the novel sites are functional, that the matrix model correctly estimates their binding strength, and that calculated CRP binding strength is the quantity that is conserved between species. The analysis also identifies many new E. coli binding sites and genes likely to be functional for CRP.
环磷酸腺苷反应蛋白(CRP)是一种已知可调节大肠杆菌中许多基因的转录因子。转录因子与DNA结合的计算研究通常基于序列依赖性结合能的简单矩阵模型。对于CRP,该模型预测了许多未知具有功能的结合位点。如果它们确实是虚假的,那么潜在的结合模型就会受到质疑。我们使用物种比较方法来评估大肠杆菌中此类预测的CRP位点群体的功能。我们将它们与通过CLUSTALW比对独立鉴定的鼠伤寒沙门氏菌中的直系同源位点进行比较,并发现突变概率与位点中的位置有关。这种相关性随着预测位点结合能的增加而增加。突变受到最强烈抑制的位置是那些突变对预测结合能影响最大的位置。这一发现表明许多新位点具有功能,矩阵模型正确估计了它们的结合强度,并且计算出的CRP结合强度是物种间保守的量。该分析还鉴定出许多新的大肠杆菌结合位点以及可能对CRP具有功能的基因。