He Yuhui, Peng Panxin, Ying Wenwei, Wang Qinwei, Wang Yan, Liu Xiankui, Song Wenhui, Gao Yue, Li Peizhe, Wang Jie, Zhu Weijie, Gao Wenzhi, Zhou Xiaofeng, Li Xuesong, Zhou Liqun
Department of Urology, Peking University First Hospital, Beijing, China.
Department of Urology, China-Japan Friendship Hospital, Beijing, China.
Transl Androl Urol. 2022 Feb;11(2):139-148. doi: 10.21037/tau-21-780.
Quick and accurate identification of urinary calculi patients with positive urinary cultures is critical to the choice of the treatment strategy. Predictive models based on machine learning algorithms provide a new way to solve this problem. This study aims to determine the predictive value of machine learning algorithms using a urine culture predictive model based on patients with urinary calculi.
Data were collected from four clinical centers in the period of June 2016, to May 2019. 2,054 cases were included in the study. The dataset was randomly split into ratios of 5:5, 6:4, and 7:3 for model construction and validation. Predictive models of urine culture outcomes were constructed and validated by logistic regression, random forest, adaboost, and gradient boosting decision tree (GBDT) models. Each ratio's construction and verification were repeated five times independently for cross-validation. The Matthews correlation coefficient (MMC), F1-score, receiver operating characteristic (ROC) curve with the area under curve (AUC) was used to evaluate the performance of each prediction model. The additive net reclassification index (NRI) and absolute NRI were used to assess the predictive capabilities of the models.
Four prediction models of urinary culture results in patients with urinary calculi were constructed. The mean AUCs of the logistic regression, random forest, adaboost, and GBDT models were 0.761 (95% CI: 0.753-0.770), 0.790 (95% CI: 0.782-0.798), 0.779 (95% CI: 0.766-0.791), and 0.831 (95% CI: 0.823-0.840), respectively. Moreover, the average MMC and F1-score of GBDT model was 0.460 and 0.588, which was improved compared to logistic regression model of 0.335 and 0.501. The additive NRI and absolute NRI of the GBDT and logistic regression models were 0.124 (95% CI: 0.106-0.142) and 0.065 (95% CI: 0.060-0.069), respectively.
Our results indicate that machine learning algorithms may be useful tools for urine culture outcome prediction in patients with urinary calculi because they exhibit superior performance compared with the logistic regression model.
快速准确地识别尿培养阳性的尿路结石患者对于治疗策略的选择至关重要。基于机器学习算法的预测模型为解决这一问题提供了新途径。本研究旨在使用基于尿路结石患者的尿培养预测模型来确定机器学习算法的预测价值。
收集了2016年6月至2019年5月期间四个临床中心的数据。共纳入2054例患者。数据集随机按5:5、6:4和7:3的比例划分为模型构建组和验证组。通过逻辑回归、随机森林、自适应增强和梯度提升决策树(GBDT)模型构建并验证尿培养结果的预测模型。每个比例的构建和验证独立重复五次以进行交叉验证。使用马修斯相关系数(MMC)、F1分数、带有曲线下面积(AUC)的受试者工作特征(ROC)曲线来评估每个预测模型的性能。使用相加净重新分类指数(NRI)和绝对NRI来评估模型的预测能力。
构建了四种尿路结石患者尿培养结果的预测模型。逻辑回归、随机森林、自适应增强和GBDT模型的平均AUC分别为0.761(95%CI:0.753 - 0.770)、0.790(95%CI:0.782 - 0.798)、0.779(95%CI:0.766 - 0.791)和0.831(95%CI:0.823 - 0.840)。此外,GBDT模型的平均MMC和F1分数分别为0.460和0.588,相较于逻辑回归模型的0.335和0.501有所提高。GBDT模型和逻辑回归模型的相加NRI和绝对NRI分别为0.124(95%CI:0.106 - 0.142)和0.065(95%CI:0.060 - 0.069)。
我们的结果表明,机器学习算法可能是预测尿路结石患者尿培养结果的有用工具,因为与逻辑回归模型相比,它们表现出更优的性能。