Lin Eugene, Lin Chieh-Hsin, Lai Yi-Lun, Huang Chiung-Hsien, Huang Yu-Jhen, Lane Hsien-Yuan
Department of Electrical & Computer Engineering, University of Washington, Seattle, WA, United States.
Department of Biostatistics, University of Washington, Seattle, WA, United States.
Front Psychiatry. 2018 Nov 6;9:566. doi: 10.3389/fpsyt.2018.00566. eCollection 2018.
The (, also known as ) gene is a strong schizophrenia susceptibility gene. Higher G72 protein levels have been implicated in patients with schizophrenia. The current study aimed to differentiate patients with schizophrenia from healthy individuals using single nucleotide polymorphisms (SNPs) and G72 protein levels by leveraging computational artificial intelligence and machine learning tools. A total of 149 subjects with 89 patients with schizophrenia and 60 healthy controls were recruited. Two genotypes (including rs1421292 and rs2391191) and G72 protein levels were measured with the peripheral blood. We utilized three machine learning algorithms (including logistic regression, naive Bayes, and C4.5 decision tree) to build the optimal predictive model for distinguishing schizophrenia patients from healthy controls. The naive Bayes model using two factors, including rs1421292 and G72 protein, appeared to be the best model for disease susceptibility (sensitivity = 0.7969, specificity = 0.9372, area under the receiver operating characteristic curve (AUC) = 0.9356). However, a model integrating rs1421292 only slightly increased the discriminative power than a model with G72 protein alone (sensitivity = 0.7941, specificity = 0.9503, AUC = 0.9324). Among the three models with G72 protein alone, the naive Bayes with G72 protein alone had the best specificity (0.9503), while logistic regression with G72 protein alone was the most sensitive (0.8765). The findings remained similar after adjusting for age and gender. This study suggests that G72 protein alone, without incorporating the two SNPs, may have been suitable enough to identify schizophrenia patients. We also recommend applying both naive Bayes and logistic regression models for the best specificity and sensitivity, respectively. Larger-scale studies are warranted to confirm the findings.
(,也称为 )基因是一种强大的精神分裂症易感基因。精神分裂症患者中G72蛋白水平较高。当前研究旨在利用计算人工智能和机器学习工具,通过单核苷酸多态性(SNP)和G72蛋白水平,区分精神分裂症患者和健康个体。共招募了149名受试者,其中89名精神分裂症患者和60名健康对照。用外周血测量了两种 基因型(包括rs1421292和rs2391191)以及G72蛋白水平。我们利用三种机器学习算法(包括逻辑回归、朴素贝叶斯和C4.5决策树)构建区分精神分裂症患者和健康对照的最佳预测模型。使用rs1421292和G72蛋白这两个因素的朴素贝叶斯模型似乎是疾病易感性的最佳模型(敏感性 = 0.7969,特异性 = 0.9372,受试者工作特征曲线下面积(AUC) = 0.9356)。然而,仅整合rs1421292的模型比仅使用G72蛋白的模型在区分能力上仅略有提高(敏感性 = 0.7941,特异性 = 0.9503,AUC = 0.9324)。在仅使用G72蛋白的三个模型中,仅使用G72蛋白的朴素贝叶斯模型具有最佳特异性(0.9503),而仅使用G72蛋白的逻辑回归模型最敏感(0.8765)。在调整年龄和性别后,结果仍然相似。本研究表明,仅G72蛋白,不纳入两个 SNP,可能足以识别精神分裂症患者。我们还建议分别应用朴素贝叶斯和逻辑回归模型以获得最佳特异性和敏感性。需要进行更大规模的研究来证实这些发现。