Chen Wei, Zheng Haotian, Ye Binglin, Guo Tiefeng, Xu Yude, Fu Zhibin, Ji Xing, Chai Xiping, Li Shenghua, Deng Qiang
Clinical College of Chinese Medicine, Gansu University of Chinese Medicine, Lanzhou, Gansu, China.
Department of Orthopaedics, Traditional Chinese Medical Hospital of Gansu Province, Qilihe District, Guazhou Street 418, Lanzhou, 730050,, Gansu, China.
Sci Rep. 2025 Jan 11;15(1):1703. doi: 10.1038/s41598-025-85945-9.
Knee osteoarthritis (KOA) represents a progressive degenerative disorder characterized by the gradual erosion of articular cartilage. This study aimed to develop and validate biomarker-based predictive models for KOA diagnosis using machine learning techniques. Clinical data from 2594 samples were obtained and stratified into training and validation datasets in a 7:3 ratio. Key clinical features were identified through differential analysis between KOA and control groups, combined with least absolute shrinkage and selection operator (LASSO) regression. The SHapley Additive Planning (SHAP) method was employed to rank feature importance quantitatively. Based on these rankings, predictive models were constructed using Logistic Regression (LR), Random Forest (RF), eXtreme Gradient Boosting (xGBoost), Naive Bayes (NB), Support Vector Machine (SVM), and Decision Tree (DT) algorithms. Models were developed for subsets of variables, including the top 5, top 10, top 15, and all identified features. Receiver operating characteristic (ROC) curves were applied to compare diagnostic performance across models. Additionally, a risk stratification framework for KOA prediction was designed using recursive partitioning analysis (RPA). Using difference analysis and LASSO, 44 critical clinical features were identified. Among these, age, plasma prothrombin time, gender, body mass index (BMI), and prothrombin time and international normalized ratio (PTINR) emerged as the top five features, with SHAP values of 0.1990, 0.0981, 0.0471, 0.0433, and 0.0422, respectively. Machine learning analysis demonstrated that these variables provided robust diagnostic performance for KOA. In the training set, area under the curve (AUC) values for LR, RF, xGBoost, NB, SVM, and DT models were 0.947, 0.961, 0.892, 0.952, 0.885, and 0.779, respectively. Similarly, in the validation dataset, these models achieved AUC values of 0.961, 0.943, 0.789, 0.957, 0.824, and 0.76. Among them, RF consistently exhibited superior diagnostic accuracy for KOA. Additionally, RPA analysis indicated a higher prevalence of KOA among individuals aged 54 years and older. The integration of the top five clinical variables significantly enhanced the diagnostic accuracy for KOA, particularly when employing the RF model. Moreover, the RPA model offered valuable insights to assist clinicians in refining prognostic assessments and optimizing clinical decision-making processes.
膝关节骨关节炎(KOA)是一种进行性退行性疾病,其特征是关节软骨逐渐磨损。本研究旨在利用机器学习技术开发并验证基于生物标志物的KOA诊断预测模型。获取了来自2594个样本的临床数据,并按照7:3的比例分层为训练数据集和验证数据集。通过KOA组与对照组之间的差异分析,并结合最小绝对收缩和选择算子(LASSO)回归,确定了关键临床特征。采用SHapley加法规划(SHAP)方法对特征重要性进行定量排序。基于这些排序,使用逻辑回归(LR)、随机森林(RF)、极端梯度提升(xGBoost)、朴素贝叶斯(NB)、支持向量机(SVM)和决策树(DT)算法构建了预测模型。针对变量子集开发了模型,包括排名前5、前10、前15的特征以及所有已识别的特征。应用受试者工作特征(ROC)曲线比较各模型的诊断性能。此外,使用递归划分分析(RPA)设计了一个KOA预测的风险分层框架。通过差异分析和LASSO,确定了44个关键临床特征。其中,年龄、血浆凝血酶原时间、性别、体重指数(BMI)以及凝血酶原时间和国际标准化比值(PTINR)成为排名前五的特征,其SHAP值分别为0.1990、0.0981、0.0471、0.0433和0.0422。机器学习分析表明,这些变量为KOA提供了强大的诊断性能。在训练集中,LR、RF、xGBoost、NB、SVM和DT模型的曲线下面积(AUC)值分别为0.947、0.961、0.892、0.952、0.885和0.779。同样,在验证数据集中,这些模型的AUC值分别为0.961、0.943、0.789、0.957、0.824和0.76。其中,RF对KOA始终表现出卓越的诊断准确性。此外,RPA分析表明,54岁及以上人群中KOA的患病率更高。整合排名前五的临床变量显著提高了KOA的诊断准确性,尤其是在使用RF模型时。此外,RPA模型提供了有价值的见解,有助于临床医生完善预后评估并优化临床决策过程。