Jiang Mingfei, Li Xiaoran
School of Public Health, Southeast University, Hunan Road, Nanjing, Jiangsu, 210009, China.
Department of Radiology, Nanjing Gaochun People's Hospital, No.53, Maoshan Road, Nanjing, 211300, China.
BMC Public Health. 2025 May 30;25(1):1999. doi: 10.1186/s12889-025-23108-1.
This study aimed to develop a machine learning system to predict social isolation risk in older adults.
Data from a sample of 6588 older adults in China were analyzed using information from China Health and Retirement Longitudinal Study from 2015 to 2018. We employed the light gradient boosting machine (Lightgbm) algorithm to determine the most common predictors of social isolation among older adults. After identifying these predictors, we trained and optimized 7 models to predict the risk of social isolation among older adults: Lightgbm, logistic regression, decision tree, support vector machine, random forest, gradient boosting decision tree (Gbdt), and Xgboost. In addition, the Shapely additive explanation (SHAP) method was used to show the contribution of each social isolation predictor to the prediction. Statistical analysis was conducted from December 2023 to April 2024.
The Gbdt model had the best performance with an accuracy of 0.7247, sensitivity of 0.9207, specificity of 0.6273, F1 score of 0.6894, and Area Under Curve of 0.84. In addition, the SHAP method demonstrated that intergeneration financial support, child visits, age, left-hand grip strength, and loneliness were the most important characteristics.
The combination of Gbdt and SHAP provides a clear explanation of the factors contributing to predicting the personalized risk of social isolation for individuals and an intuitive understanding of the impact of key features.
本研究旨在开发一个机器学习系统,以预测老年人的社会隔离风险。
利用2015年至2018年中国健康与养老追踪调查的信息,对来自中国6588名老年人样本的数据进行分析。我们采用轻量级梯度提升机(Lightgbm)算法来确定老年人社会隔离最常见的预测因素。在确定这些预测因素后,我们训练并优化了7个模型,以预测老年人的社会隔离风险:Lightgbm、逻辑回归、决策树、支持向量机、随机森林、梯度提升决策树(Gbdt)和Xgboost。此外,使用Shapely加法解释(SHAP)方法来展示每个社会隔离预测因素对预测的贡献。统计分析于2023年12月至2024年4月进行。
Gbdt模型表现最佳,准确率为0.7247,灵敏度为0.9207,特异性为0.6273,F1分数为0.6894,曲线下面积为0.84。此外,SHAP方法表明代际经济支持、子女探访、年龄、左手握力和孤独感是最重要的特征。
Gbdt和SHAP的结合为预测个体社会隔离的个性化风险的影响因素提供了清晰的解释,并直观地说明了关键特征的影响。