Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa053.
P53 is the 'guardian of the genome' and is responsible for regulating cell cycle and apoptosis. The genomic p53 binding regions, where activating transcriptional factors and cofactors like p300 simultaneously bind, are called 'p53-dependent enhancers', which play an important role in tumorigenesis. Current experimental assays generally provide a broad peak of each enhancer element, leaving our knowledge about critical enhancer regions (CERs) limited. Under the inspiration of enhancer dissection by CRISPR-Cas9 screen library on genome-wide p53 binding sites, here we introduce a statistical framework called 'Computational CRISPR Strategy' (CCS), to predict whether a given DNA fragment will be a p53-dependent CER by employing 7-mer as feature extractions along with random forest as the regressor. When training on a p53 CRISPR enhancer dataset, CCS not only accurately fitted the top-ranked enriched single guide RNAs (sgRNAs) but also successfully reproduced two known CERs that were validated by experiments. When applying it to an independent testing dataset on a tilling of a 2K-b genomic region of CRISPR-deCDKN1A-Lib, the trained model shows great generalizability by identifying a CER containing five top-ranked sgRNAs. A feature importance analysis further indicates that top-ranked 7-mers are mapped onto informative TF motifs including POU5F1 and SOX5, which are differentially enriched in p53-dependent CERs and are potential factors to make a general p53 binding site to form a p53-dependent CER, providing the interpretability of the trained model. Our results demonstrate that CCS is an alternative way of the CRISPR experiment to screen the genome for mapping p53-dependent CERs.
p53 是“基因组的守护者”,负责调节细胞周期和细胞凋亡。基因组 p53 结合区域,激活转录因子和辅因子如 p300 同时结合的区域,被称为“p53 依赖增强子”,在肿瘤发生中发挥重要作用。目前的实验检测通常提供每个增强子元件的广泛峰,使我们对关键增强子区域 (CERs) 的了解有限。受 CRISPR-Cas9 筛选文库在全基因组 p53 结合位点上对增强子进行细分的启发,我们在这里引入了一种称为“计算性 CRISPR 策略”(CCS)的统计框架,通过使用 7-mer 作为特征提取,并结合随机森林作为回归器,来预测给定的 DNA 片段是否为 p53 依赖的 CER。在 p53 CRISPR 增强子数据集上进行训练时,CCS 不仅准确地拟合了排名最高的富集单指导 RNA(sgRNA),而且还成功地复制了两个通过实验验证的已知 CER。当将其应用于 CRISPR-deCDKN1A-Lib 的 2K-b 基因组区域的 tiling 的独立测试数据集时,训练模型通过识别包含五个排名最高的 sgRNA 的 CER 显示出很好的泛化能力。特征重要性分析进一步表明,排名最高的 7-mers 映射到信息丰富的 TF 基序上,包括 POU5F1 和 SOX5,它们在 p53 依赖的 CER 中差异富集,并且是使一般的 p53 结合位点形成 p53 依赖的 CER 的潜在因素,为训练模型提供了可解释性。我们的结果表明,CCS 是筛选基因组以绘制 p53 依赖的 CER 的 CRISPR 实验的替代方法。