Martyn Gabriella E, Montgomery Michael T, Jones Hank, Guo Katherine, Doughty Benjamin R, Linder Johannes, Chen Ziwei, Cochran Kelly, Lawrence Kathryn A, Munson Glen, Pampari Anusri, Fulco Charles P, Kelley David R, Lander Eric S, Kundaje Anshul, Engreitz Jesse M
Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA.
Basic Science and Engineering Initiative, Stanford Children's Health, Betty Irene Moore Children's Heart Center, Stanford, CA, USA.
bioRxiv. 2023 Dec 21:2023.12.20.572268. doi: 10.1101/2023.12.20.572268.
Regulatory DNA sequences within enhancers and promoters bind transcription factors to encode cell type-specific patterns of gene expression. However, the regulatory effects and programmability of such DNA sequences remain difficult to map or predict because we have lacked scalable methods to precisely edit regulatory DNA and quantify the effects in an endogenous genomic context. Here we present an approach to measure the quantitative effects of hundreds of designed DNA sequence variants on gene expression, by combining pooled CRISPR prime editing with RNA fluorescence hybridization and cell sorting (Variant-FlowFISH). We apply this method to mutagenize and rewrite regulatory DNA sequences in an enhancer and the promoter of in two immune cell lines. Of 672 variant-cell type pairs, we identify 497 that affect expression. These variants appear to act through a variety of mechanisms including disruption or optimization of existing transcription factor binding sites, as well as creation of sites. Disrupting a single endogenous transcription factor binding site often led to large changes in expression (up to -40% in the enhancer, and -50% in the promoter). The same variant often had different effects across cell types and states, demonstrating a highly tunable regulatory landscape. We use these data to benchmark performance of sequence-based predictive models of gene regulation, and find that certain types of variants are not accurately predicted by existing models. Finally, we computationally design 185 small sequence variants (≤10 bp) and optimize them for specific effects on expression . 84% of these rationally designed edits showed the intended direction of effect, and some had dramatic effects on expression (-100% to +202%). Variant-FlowFISH thus provides a powerful tool to map the effects of variants and transcription factor binding sites on gene expression, test and improve computational models of gene regulation, and reprogram regulatory DNA.
增强子和启动子中的调控DNA序列与转录因子结合,以编码细胞类型特异性的基因表达模式。然而,由于我们缺乏可扩展的方法来精确编辑调控DNA并在内源基因组背景下量化其影响,此类DNA序列的调控作用和可编程性仍然难以绘制或预测。在这里,我们提出了一种方法,通过将汇集的CRISPR碱基编辑与RNA荧光杂交和细胞分选(变体流式荧光原位杂交)相结合,来测量数百个设计的DNA序列变体对基因表达的定量影响。我们应用这种方法在两种免疫细胞系的增强子和启动子中诱变并重写调控DNA序列。在672个变体 - 细胞类型对中,我们鉴定出497个影响[具体基因]表达的变体。这些变体似乎通过多种机制起作用,包括破坏或优化现有的转录因子结合位点,以及创建新的位点。破坏单个内源性转录因子结合位点通常会导致表达的大幅变化(增强子中高达 -40%,启动子中高达 -50%)。同一变体在不同细胞类型和状态下往往具有不同的影响,这表明调控格局具有高度的可调性。我们利用这些数据对基于序列的基因调控预测模型的性能进行基准测试,发现现有模型无法准确预测某些类型的变体。最后,我们通过计算设计了185个小序列变体(≤10 bp),并针对其对表达的特定影响进行优化。这些经过合理设计的编辑中有84%显示出预期的效果方向,有些对表达有显著影响(-100%至 +202%)。因此,变体流式荧光原位杂交为绘制变体和转录因子结合位点对基因表达的影响、测试和改进基因调控的计算模型以及重新编程调控DNA提供了一个强大的工具。