Department of Urology, Vanderbilt University Medical Center, Nashville, TN.
University of South Carolina School of Medicine, Columbia, SC.
Urology. 2022 Nov;169:52-57. doi: 10.1016/j.urology.2022.07.008. Epub 2022 Jul 16.
To help guide empiric therapy for kidney stone disease, we sought to demonstrate the feasibility of predicting 24-hour urine abnormalities using machine learning methods.
We trained a machine learning model (XGBoost [XG]) to predict 24-hour urine abnormalities from electronic health record-derived data (n = 1314). The machine learning model was compared to a logistic regression model [LR]. Additionally, an ensemble (EN) model combining both XG and LR models was evaluated as well. Models predicted binary 24-hour urine values for volume, sodium, oxalate, calcium, uric acid, and citrate; as well as a multiclass prediction of pH. We evaluated performance using area under the receiver operating curve (AUC-ROC) and identified predictors for each model.
The XG model was able to discriminate 24-hour urine abnormalities with fair performance, comparable to LR. The XG model most accurately predicted abnormalities of urine volume (accuracy = 98%, AUC-ROC = 0.59), uric acid (69%, 0.73) and elevated urine sodium (71%, 0.79). The LR model outperformed the XG model alone in prediction of abnormalities of urinary pH (AUC-ROC of 0.66 vs 0.57) and citrate (0.69 vs 0.64). The EN model most accurately predicted abnormalities of oxalate (accuracy = 65%, ROC-AUC = 0.70) and citrate (65%, 0.69) with overall similar predictive performance to either XG or LR alone. Body mass index, age, and gender were the three most important features for training the models for all outcomes.
Urine chemistry prediction for kidney stone disease appears to be feasible with machine learning methods. Further optimization of the performance could facilitate dietary or pharmacologic prevention.
为了帮助指导肾结石疾病的经验性治疗,我们试图展示使用机器学习方法预测 24 小时尿液异常的可行性。
我们使用机器学习模型(XGBoost[XG])从电子健康记录数据中训练了一个预测 24 小时尿液异常的模型(n=1314)。将机器学习模型与逻辑回归模型[LR]进行比较。此外,还评估了结合 XG 和 LR 模型的集成(EN)模型。模型预测了体积、钠、草酸盐、钙、尿酸和柠檬酸的 24 小时尿液值的二进制值;以及 pH 值的多类预测。我们使用接收器操作特征曲线下的面积(AUC-ROC)评估了性能,并确定了每个模型的预测因子。
XG 模型能够区分 24 小时尿液异常,表现与 LR 相当。XG 模型最准确地预测了尿液量(准确性=98%,AUC-ROC=0.59)、尿酸(69%,0.73)和尿液钠升高(71%,0.79)的异常。LR 模型在预测尿液 pH 值(AUC-ROC 为 0.66 与 0.57)和柠檬酸(0.69 与 0.64)的异常方面优于 XG 模型。EN 模型最准确地预测了草酸盐(准确性=65%,ROC-AUC=0.70)和柠檬酸(65%,0.69)的异常,整体预测性能与 XG 或 LR 相似。对于所有结果,体重指数、年龄和性别是训练模型的三个最重要的特征。
使用机器学习方法预测肾结石疾病的尿液化学似乎是可行的。进一步优化性能可以促进饮食或药物预防。