Department of Neurology and Institute of Neurology, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
Department of Biochemistry and Molecular Cell Biology, Shanghai Key Laboratory for Tumor Microenvironment and Inflammation, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
J Alzheimers Dis. 2024;99(s2):S299-S315. doi: 10.3233/JAD-230314.
Late-onset Alzheimer's disease (LOAD) is the most common type of dementia, but its pathogenesis remains unclear, and there is a lack of simple and convenient early diagnostic markers to predict the occurrence.
Our study aimed to identify diagnostic candidate genes to predict LOAD by machine learning methods.
Three publicly available datasets from the Gene Expression Omnibus (GEO) database containing peripheral blood gene expression data for LOAD, mild cognitive impairment (MCI), and controls (CN) were downloaded. Differential expression analysis, the least absolute shrinkage and selection operator (LASSO), and support vector machine recursive feature elimination (SVM-RFE) were used to identify LOAD diagnostic candidate genes. These candidate genes were then validated in the validation group and clinical samples, and a LOAD prediction model was established.
LASSO and SVM-RFE analyses identified 3 mitochondria-related genes (MRGs) as candidate genes, including NDUFA1, NDUFS5, and NDUFB3. In the verification of 3 MRGs, the AUC values showed that NDUFA1, NDUFS5 had better predictability. We also verified the candidate MRGs in MCI groups, the AUC values showed good performance. We then used NDUFA1, NDUFS5 and age to build a LOAD diagnostic model and AUC was 0.723. Results of qRT-PCR experiments with clinical blood samples showed that the three candidate genes were expressed significantly lower in the LOAD and MCI groups when compared to CN.
Two mitochondrial-related candidate genes, NDUFA1 and NDUFS5, were identified as diagnostic markers for LOAD and MCI. Combining these two candidate genes with age, a LOAD diagnostic prediction model was successfully constructed.
迟发性阿尔茨海默病(LOAD)是最常见的痴呆类型,但发病机制尚不清楚,也缺乏简单便捷的早期诊断标志物来预测其发生。
本研究旨在通过机器学习方法鉴定诊断候选基因以预测 LOAD。
从基因表达综合数据库(GEO)中下载了 3 个包含 LOAD、轻度认知障碍(MCI)和对照(CN)外周血基因表达数据的公共数据集。采用差异表达分析、最小绝对收缩和选择算子(LASSO)和支持向量机递归特征消除(SVM-RFE)方法鉴定 LOAD 诊断候选基因。然后在验证组和临床样本中验证这些候选基因,并建立 LOAD 预测模型。
LASSO 和 SVM-RFE 分析确定了 3 个与线粒体相关的基因(MRGs)作为候选基因,包括 NDUFA1、NDUFS5 和 NDUFB3。在对 3 个 MRGs 的验证中,AUC 值表明 NDUFA1、NDUFS5 具有更好的预测能力。我们还在 MCI 组中验证了候选的 MRGs,AUC 值显示出良好的性能。然后,我们使用 NDUFA1、NDUFS5 和年龄构建了 LOAD 诊断模型,AUC 为 0.723。临床血液样本 qRT-PCR 实验结果表明,与 CN 相比,LOAD 和 MCI 组中这 3 个候选基因的表达明显降低。
鉴定出 2 个与线粒体相关的候选基因 NDUFA1 和 NDUFS5 可作为 LOAD 和 MCI 的诊断标志物。将这两个候选基因与年龄相结合,成功构建了 LOAD 诊断预测模型。