Suppr超能文献

使用电子健康记录数据评估不同机器学习方法在预测 1 型糖尿病成人糖尿病酮症酸中毒中的性能。

Performance assessment of different machine learning approaches in predicting diabetic ketoacidosis in adults with type 1 diabetes using electronic health records data.

机构信息

Sanofi U.S. LLC, Bridgewater, New Jersey, USA.

Sanofi U.S. LLC, Cambridge, Massachusetts, USA.

出版信息

Pharmacoepidemiol Drug Saf. 2021 May;30(5):610-618. doi: 10.1002/pds.5199. Epub 2021 Feb 3.

Abstract

PURPOSE

To assess the performance of different machine learning (ML) approaches in identifying risk factors for diabetic ketoacidosis (DKA) and predicting DKA.

METHODS

This study applied flexible ML (XGBoost, distributed random forest [DRF] and feedforward network) and conventional ML approaches (logistic regression and least absolute shrinkage and selection operator [LASSO]) to 3400 DKA cases and 11 780 controls nested in adults with type 1 diabetes identified from Optum® de-identified Electronic Health Record dataset (2007-2018). Area under the curve (AUC), accuracy, sensitivity and specificity were computed using fivefold cross validation, and their 95% confidence intervals (CI) were established using 1000 bootstrap samples. The importance of predictors was compared across these models.

RESULTS

In the training set, XGBoost and feedforward network yielded higher AUC values (0.89 and 0.86, respectively) than logistic regression (0.83), LASSO (0.83) and DRF (0.81). However, the AUC values were similar (0.82) among these approaches in the test set (95% CI range, 0.80-0.84). While the accuracy values >0.8 and the specificity values >0.9 for all models, the sensitivity values were only 0.4. The differences in these metrics across these models were minimal in the test set. All approaches selected some known risk factors for DKA as the top 10 features. XGBoost and DRF included more laboratory measurements or vital signs compared with conventional ML approaches, while feedforward network included more social demographics.

CONCLUSIONS

In our empirical study, all ML approaches demonstrated similar performance, and identified overlapping, but different, top 10 predictors. The difference in selected top predictors needs further research.

摘要

目的

评估不同机器学习(ML)方法在识别糖尿病酮症酸中毒(DKA)风险因素和预测 DKA 方面的性能。

方法

本研究将灵活的 ML(XGBoost、分布式随机森林 [DRF] 和前馈网络)和传统的 ML 方法(逻辑回归和最小绝对收缩和选择算子 [LASSO])应用于从 Optum®去识别电子健康记录数据集(2007-2018 年)中嵌套的 3400 例 DKA 病例和 11780 例 1 型糖尿病成人对照中。使用五重交叉验证计算曲线下面积(AUC)、准确性、敏感性和特异性,并使用 1000 个引导样本建立其 95%置信区间(CI)。比较了这些模型中预测因子的重要性。

结果

在训练集中,XGBoost 和前馈网络的 AUC 值(分别为 0.89 和 0.86)高于逻辑回归(0.83)、LASSO(0.83)和 DRF(0.81)。然而,在测试集中,这些方法的 AUC 值相似(0.82;95%CI 范围,0.80-0.84)。虽然所有模型的准确率>0.8,特异性>0.9,但敏感性仅为 0.4。在测试集中,这些指标在这些模型之间的差异很小。所有方法都选择了一些已知的 DKA 风险因素作为前 10 个特征。与传统的 ML 方法相比,XGBoost 和 DRF 包含更多的实验室测量值或生命体征,而前馈网络包含更多的社会人口统计学特征。

结论

在我们的实证研究中,所有 ML 方法的性能都相似,并确定了重叠但不同的前 10 个预测因子。选择的前预测因子之间的差异需要进一步研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a040/8049019/9c9d7bfdc30c/PDS-30-610-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验