Department of Health Statistics, School of Public Health, Shandong Second Medical University, Weifang, Shandong, 261053, China.
Department of Rheumatology and Immunology, Affiliated Hospital of Shandong Second Medical University, Weifang, Shandong, 261031, China.
Lipids Health Dis. 2024 May 21;23(1):152. doi: 10.1186/s12944-024-02141-w.
Alzheimer's disease (AD) is a chronic neurodegenerative disorder that poses a substantial economic burden. The Random forest algorithm is effective in predicting AD; however, the key factors influencing AD onset remain unclear. This study aimed to analyze the key lipoprotein and metabolite factors influencing AD onset using machine-learning methods. It provides new insights for researchers and medical personnel to understand AD and provides a reference for the early diagnosis, treatment, and early prevention of AD.
A total of 603 participants, including controls and patients with AD with complete lipoprotein and metabolite data from the Alzheimer's disease Neuroimaging Initiative (ADNI) database between 2005 and 2016, were enrolled. Random forest, Lasso regression, and CatBoost algorithms were employed to rank and filter 213 lipoprotein and metabolite variables. Variables with consistently high importance rankings from any two methods were incorporated into the models. Finally, the variables selected from the three methods, with the participants' age, sex, and marital status, were used to construct a random forest predictive model.
Fourteen lipoprotein and metabolite variables were screened using the three methods, and 17 variables were included in the AD prediction model based on age, sex, and marital status of the participants. The optimal random forest modeling was constructed with "mtry" set to 3 and "ntree" set to 300. The model exhibited an accuracy of 71.01%, a sensitivity of 79.59%, a specificity of 65.28%, and an AUC (95%CI) of 0.724 (0.645-0.804). When Mean Decrease Accuracy and Gini were used to rank the proteins, age, phospholipids to total lipids ratio in intermediate-density lipoproteins (IDL_PL_PCT), and creatinine were among the top five variables.
Age, IDL_PL_PCT, and creatinine levels play crucial roles in AD onset. Regular monitoring of lipoproteins and their metabolites in older individuals is significant for early AD diagnosis and prevention.
阿尔茨海默病(AD)是一种慢性神经退行性疾病,给经济带来了巨大的负担。随机森林算法在预测 AD 方面非常有效,但影响 AD 发病的关键因素仍不清楚。本研究旨在通过机器学习方法分析影响 AD 发病的关键脂蛋白和代谢物因素。为研究人员和医务人员了解 AD 提供了新的视角,并为 AD 的早期诊断、治疗和早期预防提供了参考。
本研究共纳入了 603 名参与者,包括来自 2005 年至 2016 年 ADNI 数据库的对照组和 AD 患者,他们的脂蛋白和代谢物数据完整。采用随机森林、Lasso 回归和 CatBoost 算法对 213 个脂蛋白和代谢物变量进行排序和筛选。将两种方法的重要性排名始终较高的变量纳入模型。最后,将三种方法中筛选出的变量与参与者的年龄、性别和婚姻状况一起,构建随机森林预测模型。
采用三种方法筛选出 14 个脂蛋白和代谢物变量,根据参与者的年龄、性别和婚姻状况,共纳入 17 个变量建立 AD 预测模型。最优随机森林模型的“mtry”设置为 3,“ntree”设置为 300。该模型的准确率为 71.01%,敏感度为 79.59%,特异性为 65.28%,AUC(95%CI)为 0.724(0.645-0.804)。当使用平均减少准确性和基尼系数对蛋白质进行排序时,年龄、中间密度脂蛋白(IDL)中的磷脂与总脂质的比例(IDL_PL_PCT)和肌酐位于前 5 位。
年龄、IDL_PL_PCT 和肌酐水平在 AD 发病中起着关键作用。定期监测老年人的脂蛋白及其代谢物对 AD 的早期诊断和预防具有重要意义。