评估大型医疗系统中已部署的机器学习营养不良预测模型的校准和偏差。

Assessing calibration and bias of a deployed machine learning malnutrition prediction model within a large healthcare system.

作者信息

Liou Lathan, Scott Erick, Parchure Prathamesh, Ouyang Yuxia, Egorova Natalia, Freeman Robert, Hofer Ira S, Nadkarni Girish N, Timsina Prem, Kia Arash, Levin Matthew A

机构信息

Icahn School of Medicine at Mount Sinai, New York, NY, USA.

cStructure, La Jolla, CA, USA.

出版信息

NPJ Digit Med. 2024 Jun 6;7(1):149. doi: 10.1038/s41746-024-01141-5.

DOI:10.1038/s41746-024-01141-5

PMID:38844546

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11156633/

Abstract

Malnutrition is a frequently underdiagnosed condition leading to increased morbidity, mortality, and healthcare costs. The Mount Sinai Health System (MSHS) deployed a machine learning model (MUST-Plus) to detect malnutrition upon hospital admission. However, in diverse patient groups, a poorly calibrated model may lead to misdiagnosis, exacerbating health care disparities. We explored the model's calibration across different variables and methods to improve calibration. Data from adult patients admitted to five MSHS hospitals from January 1, 2021 - December 31, 2022, were analyzed. We compared MUST-Plus prediction to the registered dietitian's formal assessment. Hierarchical calibration was assessed and compared between the recalibration sample (N = 49,562) of patients admitted between January 1, 2021 - December 31, 2022, and the hold-out sample (N = 17,278) of patients admitted between January 1, 2023 - September 30, 2023. Statistical differences in calibration metrics were tested using bootstrapping with replacement. Before recalibration, the overall model calibration intercept was -1.17 (95% CI: -1.20, -1.14), slope was 1.37 (95% CI: 1.34, 1.40), and Brier score was 0.26 (95% CI: 0.25, 0.26). Both weak and moderate measures of calibration were significantly different between White and Black patients and between male and female patients. Logistic recalibration significantly improved calibration of the model across race and gender in the hold-out sample. The original MUST-Plus model showed significant differences in calibration between White vs. Black patients. It also overestimated malnutrition in females compared to males. Logistic recalibration effectively reduced miscalibration across all patient subgroups. Continual monitoring and timely recalibration can improve model accuracy.

摘要

营养不良是一种经常被漏诊的疾病，会导致发病率、死亡率上升以及医疗成本增加。西奈山医疗系统（MSHS）采用了一种机器学习模型（MUST-Plus）来在患者入院时检测营养不良情况。然而，在不同的患者群体中，校准不佳的模型可能会导致误诊，加剧医疗保健差异。我们探讨了该模型在不同变量和方法下的校准情况，以改善校准。分析了2021年1月1日至2022年12月31日期间入住MSHS五家医院的成年患者的数据。我们将MUST-Plus预测结果与注册营养师的正式评估进行了比较。对2021年1月1日至2022年12月31日期间入院患者的重新校准样本（N = 49,562）和2023年1月1日至2023年9月30日期间入院患者的保留样本（N = 17,278）进行了分层校准评估和比较。使用有放回的自助抽样法测试校准指标的统计差异。重新校准前，整体模型校准截距为-1.17（95%置信区间：-1.20，-1.14），斜率为1.37（95%置信区间：1.34，1.40），布里尔评分是0.26（95%置信区间：0.25，0.26）。白人和黑人患者之间以及男性和女性患者之间的校准的弱度和中度测量均存在显著差异。逻辑重新校准显著改善了保留样本中模型在种族和性别方面的校准。原始的MUST-Plus模型在白人和黑人患者之间的校准存在显著差异。与男性相比，它还高估了女性的营养不良情况。逻辑重新校准有效地减少了所有患者亚组中的校准错误。持续监测和及时重新校准可以提高模型准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/83dc/11156633/e7355cb0a5f3/41746_2024_1141_Fig1_HTML.jpg

相似文献

Assessing calibration and bias of a deployed machine learning malnutrition prediction model within a large healthcare system.评估大型医疗系统中已部署的机器学习营养不良预测模型的校准和偏差。

NPJ Digit Med. 2024 Jun 6;7(1):149. doi: 10.1038/s41746-024-01141-5.

International Validation of the SORG Machine-learning Algorithm for Predicting the Survival of Patients with Extremity Metastases Undergoing Surgical Treatment.国际验证 SORG 机器学习算法在预测接受手术治疗的肢体转移患者生存情况的应用。

Clin Orthop Relat Res. 2022 Feb 1;480(2):367-378. doi: 10.1097/CORR.0000000000001969.

How Does the Skeletal Oncology Research Group Algorithm's Prediction of 5-year Survival in Patients with Chondrosarcoma Perform on International Validation?骨肿瘤研究组算法对软骨肉瘤患者 5 年生存率的预测在国际验证中的表现如何？

Clin Orthop Relat Res. 2020 Oct;478(10):2300-2308. doi: 10.1097/CORR.0000000000001305.

Does the SORG Algorithm Predict 5-year Survival in Patients with Chondrosarcoma? An External Validation.SORG 算法能否预测软骨肉瘤患者的 5 年生存率？一项外部验证。

Clin Orthop Relat Res. 2019 Oct;477(10):2296-2303. doi: 10.1097/CORR.0000000000000748.

Does the SORG Machine-learning Algorithm for Extremity Metastases Generalize to a Contemporary Cohort of Patients? Temporal Validation From 2016 to 2020.SORG 机器学习算法对肢体转移瘤的泛化能力如何？2016 年至 2020 年的时间验证。

Clin Orthop Relat Res. 2023 Dec 1;481(12):2419-2430. doi: 10.1097/CORR.0000000000002698. Epub 2023 May 25.

Machine learning approaches for prediction of early death among lung cancer patients with bone metastases using routine clinical characteristics: An analysis of 19,887 patients.利用常规临床特征预测肺癌伴骨转移患者早期死亡的机器学习方法：对 19887 例患者的分析。

Front Public Health. 2022 Oct 6;10:1019168. doi: 10.3389/fpubh.2022.1019168. eCollection 2022.

Recalibrating prognostic models to improve predictions of in-hospital child mortality in resource-limited settings.重新校准预后模型，以提高资源有限环境下院内儿童死亡率的预测精度。

Paediatr Perinat Epidemiol. 2023 May;37(4):313-321. doi: 10.1111/ppe.12948. Epub 2023 Feb 6.

Automated identification of chest radiographs with referable abnormality with deep learning: need for recalibration.深度学习自动识别有参考价值的异常胸片：需要重新校准。

Eur Radiol. 2020 Dec;30(12):6902-6912. doi: 10.1007/s00330-020-07062-7. Epub 2020 Jul 14.

Does the SORG Orthopaedic Research Group Hip Fracture Delirium Algorithm Perform Well on an Independent Intercontinental Cohort of Patients With Hip Fractures Who Are 60 Years or Older?SORG 矫形研究组髋部骨折谵妄算法是否能很好地应用于 60 岁及以上的独立洲际髋部骨折患者队列？

Clin Orthop Relat Res. 2022 Nov 1;480(11):2205-2213. doi: 10.1097/CORR.0000000000002246. Epub 2022 May 10.

Validation and Recalibration of Seattle Heart Failure Model in Japanese Acute Heart Failure Patients.验证和校正西雅图心力衰竭模型在日本急性心力衰竭患者中的应用。

J Card Fail. 2019 Jul;25(7):561-567. doi: 10.1016/j.cardfail.2018.07.463. Epub 2018 Aug 10.

引用本文的文献

Determinants of depressive symptoms in multinational middle-aged and older adults.跨国中年及老年成年人抑郁症状的决定因素

NPJ Digit Med. 2025 Aug 4;8(1):501. doi: 10.1038/s41746-025-01905-7.

A scoping review and evidence gap analysis of clinical AI fairness.临床人工智能公平性的范围综述与证据差距分析

NPJ Digit Med. 2025 Jun 14;8(1):360. doi: 10.1038/s41746-025-01667-2.

The urgency of centering safety-net organizations in AI governance.将安全网组织置于人工智能治理核心的紧迫性。

NPJ Digit Med. 2025 Feb 21;8(1):117. doi: 10.1038/s41746-025-01479-4.

本文引用的文献

Predicting Readmission or Death After Discharge From the ICU: External Validation and Retraining of a Machine Learning Model.预测 ICU 出院后再入院或死亡：机器学习模型的外部验证和重新训练。

Crit Care Med. 2023 Feb 1;51(2):291-300. doi: 10.1097/CCM.0000000000005758. Epub 2022 Dec 16.

Key Factors and AI-Based Risk Prediction of Malnutrition in Hospitalized Older Women.老年住院女性营养不良的关键因素及基于人工智能的风险预测

Geriatrics (Basel). 2022 Sep 26;7(5):105. doi: 10.3390/geriatrics7050105.

Machine Learning-Based Prediction Models for Different Clinical Risks in Different Hospitals: Evaluation of Live Performance.基于机器学习的不同医院不同临床风险预测模型：现场性能评估。

J Med Internet Res. 2022 Jun 7;24(6):e34295. doi: 10.2196/34295.

Recalibration Methods for Improved Clinical Utility of Risk Scores.风险评分临床实用性改进的再校准方法。

Med Decis Making. 2022 May;42(4):500-512. doi: 10.1177/0272989X211044697. Epub 2021 Oct 4.

Development of Electronic Health Record-Based Prediction Models for 30-Day Readmission Risk Among Patients Hospitalized for Acute Myocardial Infarction.基于电子健康记录的急性心肌梗死住院患者 30 天再入院风险预测模型的建立。

JAMA Netw Open. 2021 Jan 4;4(1):e2035782. doi: 10.1001/jamanetworkopen.2020.35782.

Detection of calibration drift in clinical prediction models to inform model updating.检测临床预测模型中的校准漂移以指导模型更新。

J Biomed Inform. 2020 Dec;112:103611. doi: 10.1016/j.jbi.2020.103611. Epub 2020 Nov 4.

Sex difference in the association between malnutrition and hypoglycemia in hospitalized patients.住院患者营养不良与低血糖的相关性存在性别差异。

Minerva Endocrinol (Torino). 2021 Sep;46(3):303-308. doi: 10.23736/S2724-6507.20.03143-0. Epub 2020 Oct 2.

MUST-Plus: A Machine Learning Classifier That Improves Malnutrition Screening in Acute Care Facilities.MUST-Plus：一种可改善急性护理机构营养不良筛查的机器学习分类器。

J Am Coll Nutr. 2021 Jan;40(1):3-12. doi: 10.1080/07315724.2020.1774821. Epub 2020 Jul 23.

Calibration: the Achilles heel of predictive analytics.校准：预测分析的阿喀琉斯之踵。

BMC Med. 2019 Dec 16;17(1):230. doi: 10.1186/s12916-019-1466-7.

The Integrated Calibration Index (ICI) and related metrics for quantifying the calibration of logistic regression models.综合校准指数（ICI）及其相关指标，用于量化逻辑回归模型的校准。

Stat Med. 2019 Sep 20;38(21):4051-4065. doi: 10.1002/sim.8281. Epub 2019 Jul 3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估大型医疗系统中已部署的机器学习营养不良预测模型的校准和偏差。

Assessing calibration and bias of a deployed machine learning malnutrition prediction model within a large healthcare system.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献