Molecular Biology Interdepartmental Doctoral Program , University of California , Los Angeles , California 90095 , United States.
Department of Molecular, Cell, and Developmental Biology , University of California , Los Angeles , California 90095 , United States.
Biochemistry. 2019 Mar 19;58(11):1539-1551. doi: 10.1021/acs.biochem.7b01069. Epub 2018 Dec 21.
Promoters are the key drivers of gene expression and are largely responsible for the regulation of cellular responses to time and environment. In Escherichia coli, decades of studies have revealed most, if not all, of the sequence elements necessary to encode promoter function. Despite our knowledge of these motifs, it is still not possible to predict the strength and regulation of a promoter from primary sequence alone. Here we develop a novel multiplexed assay to study promoter function in E. coli by building a site-specific genomic recombination-mediated cassette exchange system that allows for the facile construction and testing of large libraries of genetic designs integrated into precise genomic locations. We build and test a library of 10898 σ70 promoter variants consisting of all combinations of a set of eight -35 elements, eight -10 elements, three UP elements, eight spacers, and eight backgrounds. We find that the -35 and -10 sequence elements can explain approximately 74% of the variance in promoter strength within our data set using a simple log-linear statistical model. Simple neural network models explain >95% of the variance in our data set by capturing nonlinear interactions with the spacer, background, and UP elements.
启动子是基因表达的关键驱动因素,在很大程度上负责调节细胞对时间和环境的反应。在大肠杆菌中,经过几十年的研究,已经揭示了编码启动子功能所需的大多数(如果不是全部)序列元件。尽管我们了解了这些基序,但仍然不可能仅从原始序列预测启动子的强度和调控。在这里,我们通过构建一种基于基因组特异性重组介导盒交换系统的新型多重测定法来研究大肠杆菌中的启动子功能,该系统允许轻松构建和测试精确基因组位置处集成的大量遗传设计文库。我们构建并测试了一个由 10898 个 σ70 启动子变体组成的文库,这些变体由一组八个 -35 元件、八个 -10 元件、三个 UP 元件、八个间隔子和八个背景组成。我们发现,使用简单的对数线性统计模型,-35 和 -10 序列元件可以解释我们数据集内大约 74%的启动子强度变化。简单的神经网络模型通过捕捉与间隔子、背景和 UP 元件的非线性相互作用,解释了我们数据集内 >95%的方差。