Zhejiang Provincial Key Laboratory of Pathophysiology, School of Medicine, Ningbo University, Ningbo, China.
Zhejiang Pharmaceutical College, Ningbo, China.
J Alzheimers Dis. 2021;83(1):163-178. doi: 10.3233/JAD-210540.
BACKGROUND: Alzheimer's disease (AD) is one of many common neurodegenerative diseases without ideal treatment, but early detection and intervention can prevent the disease progression. OBJECTIVE: This study aimed to identify AD-related glycolysis gene for AD diagnosis and further investigation by integrated bioinformatics analysis. METHODS: 122 subjects were recruited from the affiliated hospitals of Ningbo University between 1 October 2015 and 31 December 2016. Their clinical information and methylation levels of 8 glycolysis genes were assessed. Machine learning algorithms were used to establish an AD prediction model. Receiver operating characteristic curve (AUC) and decision curve analysis (DCA) were used to assess the model. An AD risk factor model was developed by SHapley Additive exPlanations (SHAP) to extract features that had important impacts on AD. Finally, gene expression of AD-related glycolysis genes were validated by AlzData. RESULTS: An AD prediction model was developed using random forest algorithm with the best average ROC_AUC (0.969544). The threshold probability of the model was positive in the range of 0∼0.9875 by DCA. Eight glycolysis genes (GAPDHS, PKLR, PFKFB3, LDHC, DLD, ALDOC, LDHB, HK3) were identified by SHAP. Five of these genes (PFKFB3, DLD, ALDOC, LDHB, LDHC) have significant differences in gene expression between AD and control groups by Alzdata, while three of the genes (HK3, ALDOC, PKLR) are related to the pathogenesis of AD. GAPDHS is involved in the regulatory network of AD risk genes. CONCLUSION: We identified 8 AD-related glycolysis genes (GAPDHS, PFKFB3, LDHC, HK3, ALDOC, LDHB, PKLR, DLD) as promising candidate biomarkers for early diagnosis of AD by integrated bioinformatics analysis. Machine learning has the advantage in identifying genes.
背景:阿尔茨海默病(AD)是众多常见神经退行性疾病之一,目前尚无理想的治疗方法,但早期发现和干预可以阻止疾病进展。
目的:本研究旨在通过综合生物信息学分析,确定与 AD 相关的糖酵解基因,用于 AD 的诊断和进一步研究。
方法:2015 年 10 月 1 日至 2016 年 12 月 31 日,从宁波大学附属医院招募 122 名受试者,评估其临床信息和 8 个糖酵解基因的甲基化水平。采用机器学习算法建立 AD 预测模型,利用受试者工作特征曲线(AUC)和决策曲线分析(DCA)评估模型,通过 SHapley Additive exPlanations(SHAP)提取对 AD 有重要影响的特征,建立 AD 风险因素模型。最后,通过 AlzData 验证 AD 相关糖酵解基因的表达。
结果:采用随机森林算法建立 AD 预测模型,平均 ROC_AUC(0.969544)最高。DCA 显示模型的阈值概率在 0∼0.9875 范围内为正。通过 SHAP 发现 8 个糖酵解基因(GAPDHS、PKLR、PFKFB3、LDHC、DLD、ALDOC、LDHB、HK3)。通过 Alzdata 发现其中 5 个基因(PFKFB3、DLD、ALDOC、LDHB、LDHC)在 AD 与对照组之间的基因表达存在显著差异,其中 3 个基因(HK3、ALDOC、PKLR)与 AD 的发病机制相关。GAPDHS 参与 AD 风险基因的调控网络。
结论:通过综合生物信息学分析,我们确定了 8 个与 AD 相关的糖酵解基因(GAPDHS、PFKFB3、LDHC、HK3、ALDOC、LDHB、PKLR、DLD),作为 AD 早期诊断的有希望的候选生物标志物。机器学习在识别基因方面具有优势。