He Adam Y, Danko Charles G
Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University.
Graduate Field of Computational Biology, Cornell University.
bioRxiv. 2024 Sep 17:2024.03.13.583868. doi: 10.1101/2024.03.13.583868.
How the DNA sequence of -regulatory elements encode transcription initiation patterns remains poorly understood. Here we introduce CLIPNET, a deep learning model trained on population-scale PRO-cap data that predicts the position and quantity of transcription initiation with single nucleotide resolution from DNA sequence more accurately than existing approaches. Interpretation of CLIPNET revealed a complex regulatory syntax consisting of DNA-protein interactions in five major positions between -200 and +50 bp relative to the transcription start site, as well as more subtle positional preferences among transcriptional activators. Transcriptional activator and core promoter motifs work non-additively to encode distinct aspects of initiation, with the former driving initiation quantity and the latter initiation position. We identified core promoter motifs that explain initiation patterns in the majority of promoters and enhancers, including DPR motifs and AT-rich TBP binding sequences in TATA-less promoters. Our results provide insights into the sequence architecture governing transcription initiation.
调控元件的DNA序列如何编码转录起始模式仍知之甚少。在此,我们引入了CLIPNET,这是一种基于群体规模的PRO-cap数据训练的深度学习模型,它能从DNA序列中以单核苷酸分辨率预测转录起始的位置和数量,比现有方法更准确。对CLIPNET的解读揭示了一种复杂的调控语法,该语法由相对于转录起始位点-200至+50 bp之间五个主要位置的DNA-蛋白质相互作用组成,以及转录激活因子之间更细微的位置偏好。转录激活因子和核心启动子基序以非加性方式起作用,以编码起始的不同方面,前者驱动起始数量,后者驱动起始位置。我们鉴定出了解释大多数启动子和增强子中起始模式的核心启动子基序,包括DPR基序和无TATA启动子中富含AT的TBP结合序列。我们的结果为控制转录起始的序列结构提供了见解。