使用电子健康记录数据评估不同机器学习方法在预测 1 型糖尿病成人糖尿病酮症酸中毒中的性能。

Performance assessment of different machine learning approaches in predicting diabetic ketoacidosis in adults with type 1 diabetes using electronic health records data.

机构信息

Sanofi U.S. LLC, Bridgewater, New Jersey, USA.

Sanofi U.S. LLC, Cambridge, Massachusetts, USA.

出版信息

Pharmacoepidemiol Drug Saf. 2021 May;30(5):610-618. doi: 10.1002/pds.5199. Epub 2021 Feb 3.

DOI:10.1002/pds.5199

PMID:33480091

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8049019/

Abstract

PURPOSE

To assess the performance of different machine learning (ML) approaches in identifying risk factors for diabetic ketoacidosis (DKA) and predicting DKA.

METHODS

This study applied flexible ML (XGBoost, distributed random forest [DRF] and feedforward network) and conventional ML approaches (logistic regression and least absolute shrinkage and selection operator [LASSO]) to 3400 DKA cases and 11 780 controls nested in adults with type 1 diabetes identified from Optum® de-identified Electronic Health Record dataset (2007-2018). Area under the curve (AUC), accuracy, sensitivity and specificity were computed using fivefold cross validation, and their 95% confidence intervals (CI) were established using 1000 bootstrap samples. The importance of predictors was compared across these models.

RESULTS

In the training set, XGBoost and feedforward network yielded higher AUC values (0.89 and 0.86, respectively) than logistic regression (0.83), LASSO (0.83) and DRF (0.81). However, the AUC values were similar (0.82) among these approaches in the test set (95% CI range, 0.80-0.84). While the accuracy values >0.8 and the specificity values >0.9 for all models, the sensitivity values were only 0.4. The differences in these metrics across these models were minimal in the test set. All approaches selected some known risk factors for DKA as the top 10 features. XGBoost and DRF included more laboratory measurements or vital signs compared with conventional ML approaches, while feedforward network included more social demographics.

CONCLUSIONS

In our empirical study, all ML approaches demonstrated similar performance, and identified overlapping, but different, top 10 predictors. The difference in selected top predictors needs further research.

摘要

目的

评估不同机器学习（ML）方法在识别糖尿病酮症酸中毒（DKA）风险因素和预测 DKA 方面的性能。

方法

本研究将灵活的 ML（XGBoost、分布式随机森林 [DRF] 和前馈网络）和传统的 ML 方法（逻辑回归和最小绝对收缩和选择算子 [LASSO]）应用于从 Optum®去识别电子健康记录数据集（2007-2018 年）中嵌套的 3400 例 DKA 病例和 11780 例 1 型糖尿病成人对照中。使用五重交叉验证计算曲线下面积（AUC）、准确性、敏感性和特异性，并使用 1000 个引导样本建立其 95%置信区间（CI）。比较了这些模型中预测因子的重要性。

结果

在训练集中，XGBoost 和前馈网络的 AUC 值（分别为 0.89 和 0.86）高于逻辑回归（0.83）、LASSO（0.83）和 DRF（0.81）。然而，在测试集中，这些方法的 AUC 值相似（0.82；95%CI 范围，0.80-0.84）。虽然所有模型的准确率>0.8，特异性>0.9，但敏感性仅为 0.4。在测试集中，这些指标在这些模型之间的差异很小。所有方法都选择了一些已知的 DKA 风险因素作为前 10 个特征。与传统的 ML 方法相比，XGBoost 和 DRF 包含更多的实验室测量值或生命体征，而前馈网络包含更多的社会人口统计学特征。

结论

在我们的实证研究中，所有 ML 方法的性能都相似，并确定了重叠但不同的前 10 个预测因子。选择的前预测因子之间的差异需要进一步研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a040/8049019/9c9d7bfdc30c/PDS-30-610-g001.jpg

相似文献

Performance assessment of different machine learning approaches in predicting diabetic ketoacidosis in adults with type 1 diabetes using electronic health records data.使用电子健康记录数据评估不同机器学习方法在预测 1 型糖尿病成人糖尿病酮症酸中毒中的性能。

Pharmacoepidemiol Drug Saf. 2021 May;30(5):610-618. doi: 10.1002/pds.5199. Epub 2021 Feb 3.

Feature selection and risk prediction for diabetic patients with ketoacidosis based on MIMIC-IV.基于 MIMIC-IV 的糖尿病酮症酸中毒患者的特征选择和风险预测。

Front Endocrinol (Lausanne). 2024 Mar 27;15:1344277. doi: 10.3389/fendo.2024.1344277. eCollection 2024.

Identifying Risk Factors for Diabetic Ketoacidosis Associated with SGLT2 Inhibitors: a Nationwide Cohort Study in the USA.确定与 SGLT2 抑制剂相关的糖尿病酮症酸中毒的风险因素：美国全国队列研究。

J Gen Intern Med. 2021 Sep;36(9):2601-2607. doi: 10.1007/s11606-020-06561-z. Epub 2021 Feb 9.

Machine learning prediction models and nomogram to predict the risk of in-hospital death for severe DKA: A clinical study based on MIMIC-IV, eICU databases, and a college hospital ICU.基于 MIMIC-IV、eICU 数据库和一家大学医院 ICU 的临床研究：机器学习预测模型和诺莫图预测严重 DKA 患者院内死亡风险

Int J Med Inform. 2023 Jun;174:105049. doi: 10.1016/j.ijmedinf.2023.105049. Epub 2023 Mar 27.

Nomogram to predict the risk of acute kidney injury in patients with diabetic ketoacidosis: an analysis of the MIMIC-III database.预测糖尿病酮症酸中毒患者急性肾损伤风险的列线图：对 MIMIC-III 数据库的分析。

BMC Endocr Disord. 2021 Mar 4;21(1):37. doi: 10.1186/s12902-021-00696-8.

Predicting the risk factors of diabetic ketoacidosis-associated acute kidney injury: A machine learning approach using XGBoost.应用 XGBoost 机器学习方法预测糖尿病酮症酸中毒相关急性肾损伤的危险因素。

Front Public Health. 2023 Apr 6;11:1087297. doi: 10.3389/fpubh.2023.1087297. eCollection 2023.

Machine Learning Model for Risk Prediction of Community-Acquired Acute Kidney Injury Hospitalization From Electronic Health Records: Development and Validation Study.基于电子健康记录的社区获得性急性肾损伤住院风险预测的机器学习模型：开发和验证研究。

J Med Internet Res. 2020 Aug 4;22(8):e16903. doi: 10.2196/16903.

Predicting Sepsis Mortality in a Population-Based National Database: Machine Learning Approach.基于人群的国家数据库中预测脓毒症死亡率：机器学习方法。

J Med Internet Res. 2022 Apr 13;24(4):e29982. doi: 10.2196/29982.

A Machine Learning Model for Risk Stratification of Postdiagnosis Diabetic Ketoacidosis Hospitalization in Pediatric Type 1 Diabetes: Retrospective Study.用于1型糖尿病儿童诊断后糖尿病酮症酸中毒住院风险分层的机器学习模型：回顾性研究

JMIR Diabetes. 2024 Aug 7;9:e53338. doi: 10.2196/53338.

A data-driven approach to predicting diabetes and cardiovascular disease with machine learning.基于机器学习的数据驱动方法预测糖尿病和心血管疾病。

BMC Med Inform Decis Mak. 2019 Nov 6;19(1):211. doi: 10.1186/s12911-019-0918-5.

引用本文的文献

Artificial intelligence applied to diabetes complications: a bibliometric analysis.应用于糖尿病并发症的人工智能：一项文献计量分析。

Front Artif Intell. 2025 Jan 31;8:1455341. doi: 10.3389/frai.2025.1455341. eCollection 2025.

Predicting and Ranking Diabetic Ketoacidosis Risk Among Youth with Type 1 Diabetes with a Clinic-to-Clinic Transferrable Machine Learning Model.使用可在不同诊所间转移的机器学习模型预测1型糖尿病青少年的糖尿病酮症酸中毒风险并进行排序

Diabetes Technol Ther. 2025 Apr;27(4):271-282. doi: 10.1089/dia.2024.0484. Epub 2025 Jan 6.

Development, evaluation and comparison of machine learning algorithms for predicting in-hospital patient charges for congestive heart failure exacerbations, chronic obstructive pulmonary disease exacerbations and diabetic ketoacidosis.用于预测充血性心力衰竭加重、慢性阻塞性肺疾病加重和糖尿病酮症酸中毒患者住院费用的机器学习算法的开发、评估与比较。

BioData Min. 2024 Sep 12;17(1):35. doi: 10.1186/s13040-024-00387-9.

JMIR Diabetes. 2024 Aug 7;9:e53338. doi: 10.2196/53338.

Development and Optimization of Machine Learning Algorithms for Predicting In-hospital Patient Charges for Congestive Heart Failure Exacerbations, Chronic Obstructive Pulmonary Disease Exacerbations and Diabetic Ketoacidosis.用于预测充血性心力衰竭加重、慢性阻塞性肺疾病加重和糖尿病酮症酸中毒住院患者费用的机器学习算法的开发与优化

Res Sq. 2024 Jun 13:rs.3.rs-4490027. doi: 10.21203/rs.3.rs-4490027/v1.

Machine Learning Techniques to Predict Timeliness of Care among Lung Cancer Patients.预测肺癌患者护理及时性的机器学习技术

Healthcare (Basel). 2023 Oct 18;11(20):2756. doi: 10.3390/healthcare11202756.

An "All-Data-on-Hand" Deep Learning Model to Predict Hospitalization for Diabetic Ketoacidosis in Youth With Type 1 Diabetes: Development and Validation Study.一种“手头所有数据”深度学习模型用于预测1型糖尿病青少年糖尿病酮症酸中毒的住院情况：开发与验证研究

JMIR Diabetes. 2023 Jul 18;8:e47592. doi: 10.2196/47592.

本文引用的文献

Artificial Intelligence in Pharmacoepidemiology: A Systematic Review. Part 1-Overview of Knowledge Discovery Techniques in Artificial Intelligence.药物流行病学中的人工智能：系统评价。第1部分——人工智能中的知识发现技术概述。

Front Pharmacol. 2020 Jul 16;11:1028. doi: 10.3389/fphar.2020.01028. eCollection 2020.

Comparison of Machine Learning Methods With Traditional Models for Use of Administrative Claims With Electronic Medical Records to Predict Heart Failure Outcomes.利用电子病历中的行政索赔数据进行机器学习方法与传统模型预测心力衰竭结局的比较。

JAMA Netw Open. 2020 Jan 3;3(1):e1918962. doi: 10.1001/jamanetworkopen.2019.18962.

Neural networks versus Logistic regression for 30 days all-cause readmission prediction.神经网络与逻辑回归在 30 天全因再入院预测中的比较。

Sci Rep. 2019 Jun 26;9(1):9277. doi: 10.1038/s41598-019-45685-z.

A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models.系统评价显示，机器学习在临床预测模型中并未优于逻辑回归。

J Clin Epidemiol. 2019 Jun;110:12-22. doi: 10.1016/j.jclinepi.2019.02.004. Epub 2019 Feb 11.

International Consensus on Risk Management of Diabetic Ketoacidosis in Patients With Type 1 Diabetes Treated With Sodium-Glucose Cotransporter (SGLT) Inhibitors.钠-葡萄糖共转运蛋白 2 抑制剂治疗的 1 型糖尿病患者糖尿病酮症酸中毒风险管理的国际共识。

Diabetes Care. 2019 Jun;42(6):1147-1154. doi: 10.2337/dc18-2316. Epub 2019 Feb 6.

Trends in Diabetic Ketoacidosis Hospitalizations and In-Hospital Mortality - United States, 2000-2014.2000 - 2014年美国糖尿病酮症酸中毒住院治疗趋势及院内死亡率

MMWR Morb Mortal Wkly Rep. 2018 Mar 30;67(12):362-365. doi: 10.15585/mmwr.mm6712a3.

Big Data and Machine Learning in Health Care.医疗保健中的大数据与机器学习

JAMA. 2018 Apr 3;319(13):1317-1318. doi: 10.1001/jama.2017.18391.

Validation of an algorithm for identifying type 1 diabetes in adults based on electronic health record data.基于电子健康记录数据的成人 1 型糖尿病识别算法的验证。

Pharmacoepidemiol Drug Saf. 2018 Oct;27(10):1053-1059. doi: 10.1002/pds.4377. Epub 2018 Jan 2.

Diabetic ketoacidosis in adults.成人糖尿病酮症酸中毒

BMJ. 2015 Oct 28;351:h5660. doi: 10.1136/bmj.h5660.

Strategies for handling missing data in electronic health record derived data.电子健康记录衍生数据中缺失数据的处理策略。

EGEMS (Wash DC). 2013 Dec 17;1(3):1035. doi: 10.13063/2327-9214.1035. eCollection 2013.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用电子健康记录数据评估不同机器学习方法在预测 1 型糖尿病成人糖尿病酮症酸中毒中的性能。

Performance assessment of different machine learning approaches in predicting diabetic ketoacidosis in adults with type 1 diabetes using electronic health records data.

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献