Graham Josh P, Zhang Yu, He Lifang, Gonzalez-Fernandez Tomas
Department of Bioengineering, Lehigh University, Bethlehem, PA, USA.
Department of Electrical and Computer Engineering, Lehigh University, Bethlehem, PA, USA.
bioRxiv. 2024 Jul 3:2024.07.01.601587. doi: 10.1101/2024.07.01.601587.
CRISPR gene editing strategies are shaping cell therapies through precise and tunable control over gene expression. However, achieving reliable therapeutic effects with improved safety and efficacy requires informed target gene selection. This depends on a thorough understanding of the involvement of target genes in gene regulatory networks (GRNs) that regulate cell phenotype and function. Machine learning models have been previously used for GRN reconstruction using RNA-seq data, but current techniques are limited to single cell types and focus mainly on transcription factors. This restriction overlooks many potential CRISPR target genes, such as those encoding extracellular matrix components, growth factors, and signaling molecules, thus limiting the applicability of these models for CRISPR strategies. To address these limitations, we have developed CRISPR-GEM, a multi-layer perceptron (MLP)-based synthetic GRN constructed to accurately predict the downstream effects of CRISPR gene editing. First, input and output nodes are identified as differentially expressed genes between defined experimental and target cell/tissue types respectively. Then, MLP training learns regulatory relationships in a black-box approach allowing accurate prediction of output gene expression using only input gene expression. Finally, CRISPR-mimetic perturbations are made to each input gene individually and the resulting model predictions are compared to those for the target group to score and assess each input gene as a CRISPR candidate. The top scoring genes provided by CRISPR-GEM therefore best modulate experimental group GRNs to motivate transcriptomic shifts towards a target group phenotype. This machine learning model is the first of its kind for predicting optimal CRISPR target genes and serves as a powerful tool for enhanced CRISPR strategies across a range of cell therapies.
CRISPR基因编辑策略正通过对基因表达进行精确且可调节的控制来塑造细胞疗法。然而,要实现具有更高安全性和有效性的可靠治疗效果,需要明智地选择靶基因。这取决于对靶基因在调节细胞表型和功能的基因调控网络(GRN)中的参与情况有透彻的了解。机器学习模型此前已被用于利用RNA测序数据重建GRN,但目前的技术仅限于单一细胞类型,且主要关注转录因子。这种限制忽略了许多潜在的CRISPR靶基因,例如那些编码细胞外基质成分、生长因子和信号分子的基因,从而限制了这些模型在CRISPR策略中的适用性。为了解决这些限制,我们开发了CRISPR-GEM,这是一种基于多层感知器(MLP)的合成GRN,旨在准确预测CRISPR基因编辑的下游效应。首先,分别将输入节点和输出节点识别为定义的实验细胞/组织类型与靶细胞/组织类型之间的差异表达基因。然后,MLP训练以黑箱方式学习调控关系,从而仅使用输入基因表达就能准确预测输出基因表达。最后,对每个输入基因单独进行模拟CRISPR干扰,并将得到的模型预测结果与靶组的预测结果进行比较,以对每个输入基因作为CRISPR候选基因进行评分和评估。因此,CRISPR-GEM提供的得分最高的基因能最佳地调节实验组GRN,促使转录组向靶组表型转变。这种机器学习模型是首个用于预测最佳CRISPR靶基因的模型,并且是增强一系列细胞疗法中CRISPR策略的强大工具。