Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE41296, Gothenburg, Sweden.
Department of Biotechnology and Systems Biology, National Institute of Biology, Večna pot 111, SI1000, Ljubljana, Slovenia.
Nat Commun. 2022 Aug 30;13(1):5099. doi: 10.1038/s41467-022-32818-8.
Design of de novo synthetic regulatory DNA is a promising avenue to control gene expression in biotechnology and medicine. Using mutagenesis typically requires screening sizable random DNA libraries, which limits the designs to span merely a short section of the promoter and restricts their control of gene expression. Here, we prototype a deep learning strategy based on generative adversarial networks (GAN) by learning directly from genomic and transcriptomic data. Our ExpressionGAN can traverse the entire regulatory sequence-expression landscape in a gene-specific manner, generating regulatory DNA with prespecified target mRNA levels spanning the whole gene regulatory structure including coding and adjacent non-coding regions. Despite high sequence divergence from natural DNA, in vivo measurements show that 57% of the highly-expressed synthetic sequences surpass the expression levels of highly-expressed natural controls. This demonstrates the applicability and relevance of deep generative design to expand our knowledge and control of gene expression regulation in any desired organism, condition or tissue.
从头设计合成调控 DNA 是生物技术和医学中控制基因表达的一种很有前途的方法。使用诱变通常需要筛选相当大的随机 DNA 文库,这限制了设计只能跨越启动子的一小段,限制了它们对基因表达的控制。在这里,我们通过直接从基因组和转录组数据中学习,提出了一种基于生成对抗网络(GAN)的深度学习策略。我们的 ExpressionGAN 可以以基因特异性的方式遍历整个调控序列-表达景观,生成具有预定目标 mRNA 水平的调控 DNA,这些 DNA 跨越整个基因调控结构,包括编码区和相邻的非编码区。尽管与天然 DNA 的序列差异很大,但体内测量表明,57%的高表达合成序列超过了高表达天然对照的表达水平。这证明了深度生成设计在任何所需的生物体、条件或组织中扩展我们对基因表达调控的知识和控制的适用性和相关性。