Research Institute of Molecular Pathology, Vienna BioCenter, Campus-Vienna-BioCenter 1, Vienna, Austria.
Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, Vienna, Austria.
Nat Genet. 2022 May;54(5):613-624. doi: 10.1038/s41588-022-01048-5. Epub 2022 May 12.
Enhancer sequences control gene expression and comprise binding sites (motifs) for different transcription factors (TFs). Despite extensive genetic and computational studies, the relationship between DNA sequence and regulatory activity is poorly understood, and de novo enhancer design has been challenging. Here, we built a deep-learning model, DeepSTARR, to quantitatively predict the activities of thousands of developmental and housekeeping enhancers directly from DNA sequence in Drosophila melanogaster S2 cells. The model learned relevant TF motifs and higher-order syntax rules, including functionally nonequivalent instances of the same TF motif that are determined by motif-flanking sequence and intermotif distances. We validated these rules experimentally and demonstrated that they can be generalized to humans by testing more than 40,000 wildtype and mutant Drosophila and human enhancers. Finally, we designed and functionally validated synthetic enhancers with desired activities de novo.
增强子序列控制基因表达,并包含不同转录因子 (TF) 的结合位点 (基序)。尽管进行了广泛的遗传和计算研究,但 DNA 序列和调控活性之间的关系仍未得到很好的理解,从头设计增强子一直具有挑战性。在这里,我们构建了一个深度学习模型 DeepSTARR,可直接从黑腹果蝇 S2 细胞的 DNA 序列中定量预测数千个发育和管家增强子的活性。该模型学习了相关的 TF 基序和更高阶的语法规则,包括由基序侧翼序列和基序间距离决定的同一 TF 基序的功能不等同实例。我们通过实验验证了这些规则,并通过测试超过 40,000 个野生型和突变型果蝇和人类增强子证明了它们可以推广到人类。最后,我们设计并功能验证了具有预期活性的全新合成增强子。