Zhou Xuan, Feng Renxu, Ding Nana, Cao Wenyan, Liu Yang, Zhou Shenghu, Deng Yu
School of Biotechnology and Key Laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China.
School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China.
Nucleic Acids Res. 2025 Aug 27;53(16). doi: 10.1093/nar/gkaf863.
Core promoters are essential regulatory elements that control transcription initiation, but accurately predicting and designing their strength remains challenging due to complex sequence-function relationships and the limited generalizability of existing AI-based approaches. To address this, we developed a modular platform integrating rational library design, predictive modelling, and generative optimization into a closed-loop workflow for end-to-end core promoter engineering. Conserved and spacer region of core promoters exert distinct effects on transcriptional strength, with the former driving large-scale variation and the latter enabling finer gradation. Based on this insight, Mutation-Barcoding-Reverse Sequencing approach was used and constructed a synthetic promoter library comprising 112 955 variants with minimal redundancy and a 16 226-fold expression range. A Transformer-based model trained on this dataset achieved a Pearson correlation of 0.87 with experimentally measured promoter strengths. When combined with a conditional diffusion model, the system enabled de novo generation of promoter sequences with defined strengths, achieving a design-to-measurement correlation of 0.95 and maintaining high accuracy (R = 0.93) across varied sequence contexts. The designed promoters consistently preserved their intended strength gradients, demonstrating robust plug-and-play functionality. This work establishes a scalable and extensible platform (www.yudenglab.com) for deep learning-guided programmable design of Escherichia coli core promoters, enabling precise transcriptional control.
核心启动子是控制转录起始的重要调控元件,但由于复杂的序列-功能关系以及现有基于人工智能的方法通用性有限,准确预测和设计其强度仍然具有挑战性。为了解决这一问题,我们开发了一个模块化平台,将合理的文库设计、预测建模和生成优化整合到一个闭环工作流程中,用于端到端的核心启动子工程。核心启动子的保守区域和间隔区域对转录强度有不同的影响,前者驱动大规模变化,后者实现更精细的分级。基于这一见解,使用了突变-条形码-反向测序方法,构建了一个包含112955个变体、冗余度最小且表达范围为16226倍的合成启动子文库。在该数据集上训练的基于Transformer的模型与实验测量的启动子强度的皮尔逊相关系数达到了0.87。当与条件扩散模型相结合时,该系统能够从头生成具有确定强度的启动子序列,设计与测量的相关系数达到0.95,并且在不同的序列背景下保持高精度(R = 0.93)。设计的启动子始终保持其预期的强度梯度,展示了强大的即插即用功能。这项工作建立了一个可扩展的平台(www.yudenglab.com),用于深度学习指导的大肠杆菌核心启动子的可编程设计,实现精确的转录控制。