Gosai S J, Castro R I, Fuentes N, Butts J C, Kales S, Noche R R, Mouri K, Sabeti P C, Reilly S K, Tewhey R
Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Harvard Graduate Program in Biological and Biomedical Science, Boston MA.
bioRxiv. 2023 Aug 9:2023.08.08.552077. doi: 10.1101/2023.08.08.552077.
-regulatory elements (CREs) control gene expression, orchestrating tissue identity, developmental timing, and stimulus responses, which collectively define the thousands of unique cell types in the body. While there is great potential for strategically incorporating CREs in therapeutic or biotechnology applications that require tissue specificity, there is no guarantee that an optimal CRE for an intended purpose has arisen naturally through evolution. Here, we present a platform to engineer and validate synthetic CREs capable of driving gene expression with programmed cell type specificity. We leverage innovations in deep neural network modeling of CRE activity across three cell types, efficient optimization, and massively parallel reporter assays (MPRAs) to design and empirically test thousands of CREs. Through and validation, we show that synthetic sequences outperform natural sequences from the human genome in driving cell type-specific expression. Synthetic sequences leverage unique sequence syntax to promote activity in the on-target cell type and simultaneously reduce activity in off-target cells. Together, we provide a generalizable framework to prospectively engineer CREs and demonstrate the required literacy to write regulatory code that is fit-for-purpose across vertebrates.
调控元件(CREs)控制基因表达,协调组织特性、发育时间和刺激反应,这些共同定义了体内数千种独特的细胞类型。虽然在需要组织特异性的治疗或生物技术应用中战略性地整合CREs具有巨大潜力,但不能保证针对特定目的的最佳CRE是通过进化自然产生的。在此,我们展示了一个平台,用于设计和验证能够以编程的细胞类型特异性驱动基因表达的合成CREs。我们利用跨三种细胞类型的CRE活性深度神经网络建模、高效优化和大规模平行报告基因检测(MPRAs)方面的创新,来设计并实证测试数千个CREs。通过设计和验证,我们表明合成序列在驱动细胞类型特异性表达方面优于人类基因组中的天然序列。合成序列利用独特的序列语法来促进在靶细胞类型中的活性,同时降低在非靶细胞中的活性。我们共同提供了一个可推广的框架,用于前瞻性地设计CREs,并展示编写适用于整个脊椎动物的调控代码所需的能力。