Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Harvard Graduate Program in Biological and Biomedical Science, Boston, MA, USA.
Nature. 2024 Oct;634(8036):1211-1220. doi: 10.1038/s41586-024-08070-z. Epub 2024 Oct 23.
Cis-regulatory elements (CREs) control gene expression, orchestrating tissue identity, developmental timing and stimulus responses, which collectively define the thousands of unique cell types in the body. While there is great potential for strategically incorporating CREs in therapeutic or biotechnology applications that require tissue specificity, there is no guarantee that an optimal CRE for these intended purposes has arisen naturally. Here we present a platform to engineer and validate synthetic CREs capable of driving gene expression with programmed cell-type specificity. We take advantage of innovations in deep neural network modelling of CRE activity across three cell types, efficient in silico optimization and massively parallel reporter assays to design and empirically test thousands of CREs. Through large-scale in vitro validation, we show that synthetic sequences are more effective at driving cell-type-specific expression in three cell lines compared with natural sequences from the human genome and achieve specificity in analogous tissues when tested in vivo. Synthetic sequences exhibit distinct motif vocabulary associated with activity in the on-target cell type and a simultaneous reduction in the activity of off-target cells. Together, we provide a generalizable framework to prospectively engineer CREs from massively parallel reporter assay models and demonstrate the required literacy to write fit-for-purpose regulatory code.
顺式调控元件(CREs)控制基因表达,协调组织身份、发育时间和刺激反应,这些共同定义了体内数千种独特的细胞类型。虽然在需要组织特异性的治疗或生物技术应用中,有很大的潜力可以战略性地整合 CRE,但不能保证为这些预期目的而自然产生的最佳 CRE。在这里,我们提出了一个平台,用于设计和验证能够以编程的细胞类型特异性驱动基因表达的合成 CRE。我们利用 CRE 活性的深度神经网络模型的创新、高效的计算机优化和大规模平行报告基因检测,来设计和经验性地测试数千个 CRE。通过大规模的体外验证,我们表明与人类基因组中的天然序列相比,合成序列在三种细胞系中更有效地驱动细胞类型特异性表达,并在体内测试时在类似组织中实现特异性。合成序列表现出与靶细胞类型活性相关的独特基序词汇,同时降低了靶外细胞的活性。总的来说,我们提供了一个可推广的框架,从大规模平行报告基因检测模型中前瞻性地设计 CRE,并展示了编写适合目的调控代码所需的读写能力。