Fu Yun, Yu Yaming, Chen Weichao
Chengdu Sport University, No. 1942, Huanhu North Road, Eastern New District, Chengdu, Sichuan, China.
Sichuan Provincial Orthopedic Hospital, No. 132, West Section 1, First Ring Road, Wuhou District, Chengdu, Sichuan, China.
Sci Rep. 2025 Apr 24;15(1):14326. doi: 10.1038/s41598-025-99411-z.
Osteoarthritis is a widespread chronic joint disease, becoming increasingly prevalent, particularly among individuals over the age of 45. This condition causes joint pain and dysfunction, significantly disrupting daily life. The objective of this study is to develop an optimal machine learning model for predicting the risk of osteoarthritis in individuals aged 45 and older. This study utilized data from the National Health and Nutrition Examination Survey (NHANES) from 2011 to 2018, which included a total of 2980 individuals. The dataset was randomly divided into a training set (n = 2235) and a validation set (n = 745). Five machine learning algorithms were employed to develop the predictive model for osteoarthritis. The SHapley Additive exPlanation (SHAP) method was used to interpret the machine learning algorithms and identify the most significant features for predicting outcomes. The study involved 2980 participants and focused on predicting the probability of osteoarthritis occurrence using machine learning algorithms. Five algorithms were employed, analyzing 24 features from an average 60-year-old cohort, with 605 osteoarthritis diagnoses. After performing Recursive Feature Elimination (RFE) to select 20 features, the CatBoost model achieved an AUC of 0.8109 and an accuracy rate of 0.7315, making it the most efficient model. The most influential factors in the predictions were Gender, Age, BMI, Waist Circumference, and Race. This study demonstrates that the CatBoost model with 20 features can effectively predict the occurrence of osteoarthritis. This accurate prediction model can help inform early interventions and patient management strategies, potentially improving patient prognosis. Further research will focus on enhancing the model performance, such as incorporating additional relevant features or refining existing ones. Additionally, validating the model in more diverse patient populations, and investigating its potential for real-time implementation in clinical settings would further increase the study's impact and facilitate its translation into clinical practice.
骨关节炎是一种广泛存在的慢性关节疾病,其发病率日益上升,尤其在45岁以上人群中更为普遍。这种疾病会导致关节疼痛和功能障碍,严重影响日常生活。本研究的目的是开发一种最佳的机器学习模型,用于预测45岁及以上人群患骨关节炎的风险。本研究使用了2011年至2018年美国国家健康与营养检查调查(NHANES)的数据,该数据共纳入了2980名个体。数据集被随机分为训练集(n = 2235)和验证集(n = 745)。采用了五种机器学习算法来开发骨关节炎预测模型。使用SHapley加性解释(SHAP)方法来解释机器学习算法,并确定预测结果的最重要特征。该研究涉及2980名参与者,重点是使用机器学习算法预测骨关节炎发生的概率。采用了五种算法,分析了来自平均60岁队列的24个特征,其中有605例骨关节炎诊断病例。在执行递归特征消除(RFE)以选择20个特征后,CatBoost模型的曲线下面积(AUC)为0.8109,准确率为0.7315,成为最有效的模型。预测中最有影响力的因素是性别、年龄、体重指数、腰围和种族。本研究表明,具有20个特征的CatBoost模型可以有效预测骨关节炎的发生。这种准确的预测模型有助于为早期干预和患者管理策略提供信息,可能改善患者的预后。进一步的研究将集中于提高模型性能,例如纳入更多相关特征或优化现有特征。此外,在更多样化的患者群体中验证该模型,并研究其在临床环境中实时实施的潜力,将进一步提高该研究的影响力,并促进其转化为临床实践。