采用机器学习预测非癌症晚期慢性肾脏病患者 3 年死亡风险的特定模式和潜在风险因素。

Specific patterns and potential risk factors to predict 3-year risk of death among non-cancer patients with advanced chronic kidney disease by machine learning.

机构信息

Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan.

Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei, Taiwan.

出版信息

Medicine (Baltimore). 2024 Feb 16;103(7):e37112. doi: 10.1097/MD.0000000000037112.

DOI:10.1097/MD.0000000000037112

PMID:38363886

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10869094/

Abstract

Chronic kidney disease (CKD) is a major public health concern. But there are limited machine learning studies on non-cancer patients with advanced CKD, and the results of machine learning studies on cancer patients with CKD may not apply directly on non-cancer patients. We aimed to conduct a comprehensive investigation of risk factors for a 3-year risk of death among non-cancer advanced CKD patients with an estimated glomerular filtration rate < 60.0 mL/min/1.73m2 by several machine learning algorithms. In this retrospective cohort study, we collected data from in-hospital and emergency care patients from 2 hospitals in Taiwan from 2009 to 2019, including their international classification of disease at admission and laboratory data from the hospital's electronic medical records (EMRs). Several machine learning algorithms were used to analyze the potential impact and degree of influence of each factor on mortality and survival. Data from 2 hospitals in northern Taiwan were collected with 6565 enrolled patients. After data cleaning, 26 risk factors and approximately 3887 advanced CKD patients from Shuang Ho Hospital were used as the training set. The validation set contained 2299 patients from Taipei Medical University Hospital. Predictive variables, such as albumin, PT-INR, and age, were the top 3 significant risk factors with paramount influence on mortality prediction. In the receiver operating characteristic curve, the random forest had the highest values for accuracy above 0.80. MLP, and Adaboost had better performance on sensitivity and F1-score compared to other methods. Additionally, SVM with linear kernel function had the highest specificity of 0.9983, while its sensitivity and F1-score were poor. Logistic regression had the best performance, with an area under the curve of 0.8527. Evaluating Taiwanese advanced CKD patients' EMRs could provide physicians with a good approximation of the patients' 3-year risk of death by machine learning algorithms.

摘要

慢性肾脏病（CKD）是一个主要的公共卫生关注点。但是，对于晚期 CKD 非癌症患者，使用机器学习的研究有限，并且对于 CKD 癌症患者的机器学习研究结果可能不能直接应用于非癌症患者。我们旨在通过几种机器学习算法，对肾小球滤过率<60.0mL/min/1.73m2 的非癌症晚期 CKD 患者 3 年死亡风险的相关因素进行全面调查。在这项回顾性队列研究中，我们从台湾 2 家医院的住院和急诊患者中收集了数据，包括他们入院时的国际疾病分类和医院电子病历（EMR）中的实验室数据。我们使用了几种机器学习算法来分析每个因素对死亡率和生存的潜在影响和影响程度。从台湾北部的 2 家医院收集了数据，共纳入了 6565 名患者。在数据清理后，将来自双和医院的 26 个风险因素和大约 3887 名晚期 CKD 患者用于训练集。验证集包含了来自台北医学大学附属医院的 2299 名患者。预测变量，如白蛋白、PT-INR 和年龄，是对死亡率预测影响最大的前 3 个重要危险因素。在接收者操作特征曲线中，随机森林的准确率最高，超过 0.80。MLP 和 Adaboost 在敏感性和 F1 评分方面的性能优于其他方法。此外，具有线性核函数的 SVM 的特异性最高，为 0.9983，但其敏感性和 F1 评分较差。逻辑回归的表现最佳，曲线下面积为 0.8527。评估台湾晚期 CKD 患者的 EMR 可以通过机器学习算法为医生提供患者 3 年死亡风险的良好近似值。