Liou Lathan, Scott Erick, Parchure Prathamesh, Ouyang Yuxia, Egorova Natalia, Freeman Robert, Hofer Ira S, Nadkarni Girish N, Timsina Prem, Kia Arash, Levin Matthew A
Icahn School of Medicine at Mount Sinai, New York, NY, USA.
cStructure, La Jolla, CA, USA.
NPJ Digit Med. 2024 Jun 6;7(1):149. doi: 10.1038/s41746-024-01141-5.
Malnutrition is a frequently underdiagnosed condition leading to increased morbidity, mortality, and healthcare costs. The Mount Sinai Health System (MSHS) deployed a machine learning model (MUST-Plus) to detect malnutrition upon hospital admission. However, in diverse patient groups, a poorly calibrated model may lead to misdiagnosis, exacerbating health care disparities. We explored the model's calibration across different variables and methods to improve calibration. Data from adult patients admitted to five MSHS hospitals from January 1, 2021 - December 31, 2022, were analyzed. We compared MUST-Plus prediction to the registered dietitian's formal assessment. Hierarchical calibration was assessed and compared between the recalibration sample (N = 49,562) of patients admitted between January 1, 2021 - December 31, 2022, and the hold-out sample (N = 17,278) of patients admitted between January 1, 2023 - September 30, 2023. Statistical differences in calibration metrics were tested using bootstrapping with replacement. Before recalibration, the overall model calibration intercept was -1.17 (95% CI: -1.20, -1.14), slope was 1.37 (95% CI: 1.34, 1.40), and Brier score was 0.26 (95% CI: 0.25, 0.26). Both weak and moderate measures of calibration were significantly different between White and Black patients and between male and female patients. Logistic recalibration significantly improved calibration of the model across race and gender in the hold-out sample. The original MUST-Plus model showed significant differences in calibration between White vs. Black patients. It also overestimated malnutrition in females compared to males. Logistic recalibration effectively reduced miscalibration across all patient subgroups. Continual monitoring and timely recalibration can improve model accuracy.
营养不良是一种经常被漏诊的疾病,会导致发病率、死亡率上升以及医疗成本增加。西奈山医疗系统(MSHS)采用了一种机器学习模型(MUST-Plus)来在患者入院时检测营养不良情况。然而,在不同的患者群体中,校准不佳的模型可能会导致误诊,加剧医疗保健差异。我们探讨了该模型在不同变量和方法下的校准情况,以改善校准。分析了2021年1月1日至2022年12月31日期间入住MSHS五家医院的成年患者的数据。我们将MUST-Plus预测结果与注册营养师的正式评估进行了比较。对2021年1月1日至2022年12月31日期间入院患者的重新校准样本(N = 49,562)和2023年1月1日至2023年9月30日期间入院患者的保留样本(N = 17,278)进行了分层校准评估和比较。使用有放回的自助抽样法测试校准指标的统计差异。重新校准前,整体模型校准截距为-1.17(95%置信区间:-1.20,-1.14),斜率为1.37(95%置信区间:1.34,1.40),布里尔评分是0.26(95%置信区间:0.25,0.26)。白人和黑人患者之间以及男性和女性患者之间的校准的弱度和中度测量均存在显著差异。逻辑重新校准显著改善了保留样本中模型在种族和性别方面的校准。原始的MUST-Plus模型在白人和黑人患者之间的校准存在显著差异。与男性相比,它还高估了女性的营养不良情况。逻辑重新校准有效地减少了所有患者亚组中的校准错误。持续监测和及时重新校准可以提高模型准确性。