Lin Eugene, Lin Chieh-Hsin, Hung Chung-Chieh, Lane Hsien-Yuan
Department of Biostatistics, University of Washington, Seattle, WA, United States.
Department of Electrical & Computer Engineering, University of Washington, Seattle, WA, United States.
Front Bioeng Biotechnol. 2020 Jun 4;8:569. doi: 10.3389/fbioe.2020.00569. eCollection 2020.
In the wake of recent advances in artificial intelligence research, precision psychiatry using machine learning techniques represents a new paradigm. The D-amino acid oxidase (DAO) protein and its interaction partner, the D-amino acid oxidase activator (DAOA, also known as G72) protein, have been implicated as two key proteins in the N-methyl-D-aspartate receptor (NMDAR) pathway for schizophrenia. Another potential biomarker in regard to the etiology of schizophrenia is melatonin in the tryptophan catabolic pathway. To develop an ensemble boosting framework with random undersampling for determining disease status of schizophrenia, we established a prediction approach resulting from the analysis of genomic and demographic variables such as DAO levels, G72 levels, melatonin levels, age, and gender of 355 schizophrenia patients and 86 unrelated healthy individuals in the Taiwanese population. We compared our ensemble boosting framework with other state-of-the-art algorithms such as support vector machine, multilayer feedforward neural networks, logistic regression, random forests, naive Bayes, and C4.5 decision tree. The analysis revealed that the ensemble boosting model with random undersampling [area under the receiver operating characteristic curve (AUC) = 0.9242 ± 0.0652; sensitivity = 0.8580 ± 0.0770; specificity = 0.8594 ± 0.0760] performed maximally among predictive models to infer the complicated relationship between schizophrenia disease status and biomarkers. In addition, we identified a causal link between DAO and G72 protein levels in influencing schizophrenia disease status. The study indicates that the ensemble boosting framework with random undersampling may provide a suitable method to establish a tool for distinguishing schizophrenia patients from healthy controls using molecules in the NMDAR and tryptophan catabolic pathways.
随着人工智能研究的最新进展,使用机器学习技术的精准精神病学代表了一种新的范式。D-氨基酸氧化酶(DAO)蛋白及其相互作用伙伴D-氨基酸氧化酶激活剂(DAOA,也称为G72)蛋白,被认为是精神分裂症N-甲基-D-天冬氨酸受体(NMDAR)通路中的两种关键蛋白。色氨酸分解代谢途径中的褪黑素是精神分裂症病因学的另一种潜在生物标志物。为了开发一种用于确定精神分裂症疾病状态的随机欠采样集成增强框架,我们建立了一种预测方法,该方法源于对355名精神分裂症患者和86名台湾人群中无关健康个体的基因组和人口统计学变量(如DAO水平、G72水平、褪黑素水平、年龄和性别)的分析。我们将我们的集成增强框架与其他先进算法(如支持向量机、多层前馈神经网络、逻辑回归、随机森林、朴素贝叶斯和C4.5决策树)进行了比较。分析表明,具有随机欠采样的集成增强模型[受试者操作特征曲线下面积(AUC)=0.9242±0.0652;敏感性=0.8580±0.0770;特异性=0.8594±0.0760]在预测模型中表现最佳,以推断精神分裂症疾病状态与生物标志物之间的复杂关系。此外,我们确定了DAO和G72蛋白水平在影响精神分裂症疾病状态方面的因果联系。该研究表明,具有随机欠采样的集成增强框架可能提供一种合适的方法,以建立一种使用NMDAR和色氨酸分解代谢途径中的分子来区分精神分裂症患者与健康对照的工具。