Lally Patrick, Gómez-Romero Laura, Tierrafría Víctor H, Aquino Patricia, Rioualen Claire, Zhang Xiaoman, Kim Sunyoung, Baniulyte Gabriele, Plitnick Jonathan, Smith Carol, Babu Mohan, Collado-Vides Julio, Wade Joseph T, Galagan James E
Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA, USA.
Instituto Nacional de Medicina Genómica, Periférico Sur 4809, Arenal Tepepan, Ciudad de México, México, México.
Nat Commun. 2025 May 7;16(1):4255. doi: 10.1038/s41467-025-58862-8.
The DNA binding of most Escherichia coli Transcription Factors (TFs) has not been comprehensively mapped, and few have models that can quantitatively predict binding affinity. We report the global mapping of in vivo DNA binding for 139 E. coli TFs using ChIP-Seq. We use these data to train BoltzNet, a novel neural network that predicts TF binding energy from DNA sequence. BoltzNet mirrors a quantitative biophysical model and provides directly interpretable predictions genome-wide at nucleotide resolution. We use BoltzNet to quantitatively design novel binding sites, which we validate with biophysical experiments on purified protein. We generate models for 124 TFs that provide insight into global features of TF binding, including clustering of sites, the role of accessory bases, the relevance of weak sites, and the background affinity of the genome. Our paper provides new paradigms for studying TF-DNA binding and for the development of biophysically motivated neural networks.
大多数大肠杆菌转录因子(TFs)的DNA结合情况尚未得到全面测绘,并且很少有能够定量预测结合亲和力的模型。我们报告了使用ChIP-Seq对139种大肠杆菌TFs进行体内DNA结合的全局测绘。我们利用这些数据训练了BoltzNet,这是一种能从DNA序列预测TF结合能的新型神经网络。BoltzNet反映了一个定量生物物理模型,并在全基因组范围内以核苷酸分辨率提供可直接解释的预测。我们使用BoltzNet定量设计新的结合位点,并通过对纯化蛋白的生物物理实验进行验证。我们为124种TFs生成了模型,这些模型有助于深入了解TF结合的全局特征,包括位点聚类、辅助碱基的作用、弱位点的相关性以及基因组的背景亲和力。我们的论文为研究TF-DNA结合以及开发基于生物物理的神经网络提供了新的范例。