Department of Microbiology and Immunology, University of California, 600 16th Street, San Francisco, CA 94158, USA.
Proc Natl Acad Sci U S A. 2010 Feb 16;107(7):2854-9. doi: 10.1073/pnas.0915066107. Epub 2010 Feb 1.
Sequenced bacterial genomes provide a wealth of information but little understanding of transcriptional regulatory circuits largely because accurate prediction of promoters is difficult. We examined two important issues for accurate promoter prediction: (1) the ability to predict promoter strength and (2) the sequence properties that distinguish between active and weak/inactive promoters. We addressed promoter prediction using natural core promoters recognized by the well-studied alternative sigma factor, Escherichia coli sigma(E), as a representative of group 4 sigmas, the largest sigma group. To evaluate the contribution of sequence to promoter strength and function, we used modular position weight matrix models comprised of each promoter motif and a penalty score for suboptimal motif location. We find that a combination of select modules is moderately predictive of promoter strength and that imposing minimal motif scores distinguished active from weak/inactive promoters. The combined -35/-10 score is the most important predictor of activity. Our models also identified key sequence features associated with active promoters. A conserved "AAC" motif in the -35 region is likely to be a general predictor of function for promoters recognized by group 4 sigmas. These results provide valuable insights into sequences that govern promoter strength, distinguish active and inactive promoters for the first time, and are applicable to both in vivo and in vitro measures of promoter strength.
测序的细菌基因组提供了丰富的信息,但对转录调控回路的理解却很少,主要是因为准确预测启动子很困难。我们研究了准确预测启动子的两个重要问题:(1)预测启动子强度的能力,(2)区分活跃和弱/不活跃启动子的序列特性。我们使用自然核心启动子来解决启动子预测问题,这些启动子是由研究充分的替代 sigma 因子(大肠杆菌 sigma(E))识别的,它是第四组 sigma 因子的代表,也是最大的 sigma 因子组。为了评估序列对启动子强度和功能的贡献,我们使用由每个启动子基序和对非最优基序位置的惩罚分数组成的模块化位置权重矩阵模型。我们发现,选择的模块组合可以适度地预测启动子的强度,并且施加最小的基序分数可以区分活跃和弱/不活跃的启动子。-35/-10 评分是活性的最重要预测指标。我们的模型还确定了与活跃启动子相关的关键序列特征。-35 区域中的保守“AAC”基序可能是第四组 sigma 因子识别的启动子功能的一般预测指标。这些结果为控制启动子强度的序列提供了有价值的见解,首次区分了活跃和不活跃的启动子,并且适用于体内和体外的启动子强度测量。