Rheumatic Diseases Unit, Emek Medical Center, Afula, Israel.
Rappaport Faculty of Medicine, Technion, Haifa, Israel.
Rheumatology (Oxford). 2024 Sep 1;63(9):2411-2417. doi: 10.1093/rheumatology/keae273.
To develop a machine learning-based prediction model for identifying hyperuricemic participants at risk of developing gout.
A retrospective nationwide Israeli cohort study used the Clalit Health Insurance database of 473 124 individuals to identify adults 18 years or older with at least two serum urate measurements exceeding 6.8 mg/dl between January 2007 and December 2022. Patients with a prior gout diagnosis or on gout medications were excluded. Patients' demographic characteristics, community and hospital diagnoses, routine medication prescriptions and laboratory results were used to train a risk prediction model. A machine learning model, XGBoost, was developed to predict the risk of gout. Feature selection methods were used to identify relevant variables. The model's performance was evaluated using the receiver operating characteristic area under the curve (ROC AUC) and precision-recall AUC. The primary outcome was the diagnosis of gout among hyperuricemic patients.
Among the 301 385 participants with hyperuricemia included in the analysis, 15 055 (5%) were diagnosed with gout. The XGBoost model had a ROC-AUC of 0.781 (95% CI 0.78-0.784) and precision-recall AUC of 0.208 (95% CI 0.195-0.22). The most significant variables associated with gout diagnosis were serum uric acid levels, age, hyperlipidemia, non-steroidal anti-inflammatory drugs and diuretic purchases. A compact model using only these five variables yielded a ROC-AUC of 0.714 (95% CI 0.706-0.723) and a negative predictive value (NPV) of 95%.
The findings of this cohort study suggest that a machine learning-based prediction model had relatively good performance and high NPV for identifying hyperuricemic participants at risk of developing gout.
开发一种基于机器学习的预测模型,以识别血尿酸升高且有发展为痛风风险的参与者。
本回顾性全国性以色列队列研究使用 Clalit 健康保险数据库中的 473124 名个体数据,确定年龄在 18 岁或以上且在 2007 年 1 月至 2022 年 12 月期间至少有两次血清尿酸测量值超过 6.8mg/dl 的成年人。排除既往有痛风诊断或正在服用痛风药物的患者。患者的人口统计学特征、社区和医院诊断、常规药物处方和实验室结果用于训练风险预测模型。使用 XGBoost 机器学习模型预测痛风风险。采用特征选择方法确定相关变量。使用受试者工作特征曲线下面积(ROC AUC)和精度-召回 AUC 评估模型性能。主要结局是血尿酸升高患者的痛风诊断。
在纳入分析的 301385 例血尿酸升高患者中,有 15055 例(5%)被诊断为痛风。XGBoost 模型的 ROC-AUC 为 0.781(95%CI 0.78-0.784),精度-召回 AUC 为 0.208(95%CI 0.195-0.22)。与痛风诊断最显著相关的变量是血清尿酸水平、年龄、高血脂、非甾体抗炎药和利尿剂的使用。仅使用这五个变量的简化模型的 ROC-AUC 为 0.714(95%CI 0.706-0.723),阴性预测值(NPV)为 95%。
本队列研究结果表明,基于机器学习的预测模型在识别有发展为痛风风险的血尿酸升高参与者方面具有较好的性能和较高的 NPV。