College of Sports, Nanjing Tech University, Nanjing, China.
School of Athletic Performance, Shanghai University of Sport, Shanghai, China.
J Affect Disord. 2025 Jan 1;368:117-126. doi: 10.1016/j.jad.2024.09.059. Epub 2024 Sep 11.
This study aimed to explore the predictive value of machine learning (ML) in mild cognitive impairment (MCI) among older adults in China and to identify important factors causing MCI.
In this study, 6434 older adults were selected based on the data of the China Health and Elderly Care Longitudinal Survey (CHARLS) in 2020, and the dataset was subsequently divided into the training set and the test set, with a ratio of 6:4. To construct a prediction model for MCI in older adults, six ML algorithms were used, including logistic regression, KNN, SVM, decision tree (DT), LightGBM, and random forest (RF). The Delong test was used to compare the differences of ROC curves of different models, while decision curve analysis (DCA) was used to evaluate the model performance. The important contributions of the prediction results were then used to explain the model by the SHAP value.The Matthews correlation coefficient (MCC) was calculated to evaluate the performance of the models on imbalanced datasets. Additionally, causal analysis and counterfactual analysis were conducted to understand the feature importance and variable effects.
The area under the ROC curve of each model range from 0.71 to 0.77, indicating significant difference (P < 0.01). The DCA results show that the net benefits of LightGBM is the largest within various probability thresholds. Among all the models, the LightGBM model demonstrated the highest performance and stability. The five most important characteristics for predicting MCI were educational level, social events, gender, relationship with children, and age. Causal analysis revealed that these variables had a significant impact on MCI, with an average treatment effect of -0.144. Counterfactual analysis further validated these findings by simulating different scenarios, such as improving educational level, increasing age, and increasing social events.
The ML algorithm can effectively predict the MCI of older adults in China and identify the important factors causing MCI.
本研究旨在探讨机器学习(ML)在我国老年人群轻度认知障碍(MCI)中的预测价值,并确定导致 MCI 的重要因素。
本研究基于 2020 年中国健康与养老追踪调查(CHARLS)的数据,选取了 6434 名老年人作为研究对象,将数据集分为训练集和测试集,比例为 6:4。为构建老年人 MCI 的预测模型,使用了 6 种 ML 算法,包括逻辑回归、KNN、SVM、决策树(DT)、LightGBM 和随机森林(RF)。采用 Delong 检验比较不同模型的 ROC 曲线差异,采用决策曲线分析(DCA)评估模型性能。然后使用 SHAP 值来解释预测结果的重要性。计算马氏相关系数(MCC)评估模型在不平衡数据集上的性能。此外,进行因果分析和反事实分析以了解特征重要性和变量效应。
各模型的 ROC 曲线下面积在 0.71 到 0.77 之间,差异具有统计学意义(P<0.01)。DCA 结果表明,在各种概率阈值下,LightGBM 的净收益最大。在所有模型中,LightGBM 模型的性能和稳定性最高。预测 MCI 的五个最重要特征是教育水平、社会事件、性别、与子女的关系和年龄。因果分析表明,这些变量对 MCI 有显著影响,平均处理效应为-0.144。反事实分析通过模拟不同场景进一步验证了这些发现,例如提高教育水平、增加年龄和增加社会事件。
ML 算法可以有效预测中国老年人群的 MCI,并确定导致 MCI 的重要因素。