Bioengineering Department, The University of Texas at Dallas, Richardson, TX, USA.
Center for Systems Biology, The University of Texas at Dallas, Richardson, TX, USA.
Sci Rep. 2022 Jan 27;12(1):1481. doi: 10.1038/s41598-022-05575-3.
Two common hemoglobinopathies, sickle cell disease (SCD) and β-thalassemia, arise from genetic mutations within the β-globin gene. In this work, we identified a 500-bp motif (Fetal Chromatin Domain, FCD) upstream of human ϒ-globin locus and showed that the removal of this motif using CRISPR technology reactivates the expression of ϒ-globin. Next, we present two different cell morphology-based machine learning approaches that can be used identify human blood cells (KU-812) that harbor CRISPR-mediated FCD genetic modifications. Three candidate models from the first approach, which uses multilayer perceptron algorithm (MLP 20-26, MLP26-18, and MLP 30-26) and flow cytometry-derived cellular data, yielded 0.83 precision, 0.80 recall, 0.82 accuracy, and 0.90 area under the ROC (receiver operating characteristic) curve when predicting the edited cells. In comparison, the candidate model from the second approach, which uses deep learning (T2D5) and DIC microscopy-derived imaging data, performed with less accuracy (0.80) and ROC AUC (0.87). We envision that equivalent machine learning-based models can complement currently available genotyping protocols for specific genetic modifications which result in morphological changes in human cells.
两种常见的血红蛋白病,镰状细胞病(SCD)和β-地中海贫血,是由于β-珠蛋白基因内的基因突变引起的。在这项工作中,我们鉴定了人类γ-珠蛋白基因座上游的一个 500bp 基序(胎儿染色质域,FCD),并表明使用 CRISPR 技术去除该基序可以重新激活γ-珠蛋白的表达。接下来,我们提出了两种不同的基于细胞形态的机器学习方法,可以用于识别携带 CRISPR 介导的 FCD 遗传修饰的人类血细胞(KU-812)。第一种方法使用多层感知机算法(MLP 20-26、MLP26-18 和 MLP 30-26)和流式细胞术衍生的细胞数据,从三个候选模型中,当预测编辑细胞时,得到了 0.83 的精度、0.80 的召回率、0.82 的准确性和 0.90 的 ROC 曲线下面积(ROC 曲线)。相比之下,第二种方法使用深度学习(T2D5)和 DIC 显微镜衍生的成像数据的候选模型,准确性较低(0.80)和 ROC AUC(0.87)。我们设想,等效的基于机器学习的模型可以补充目前用于特定遗传修饰的基因分型协议,这些修饰会导致人类细胞形态发生变化。