Department of Biomedical Informatics and Data Science, Johns Hopkins School of Medicine, Johns Hopkins University, Baltimore, MD, United States.
Office of eHealth Research and Businesses, Seoul National University Bundang Hospital, Seongnam-si, Republic of Korea.
J Med Internet Res. 2024 Nov 22;26:e59260. doi: 10.2196/59260.
Accurate hospital length of stay (LoS) prediction enables efficient resource management. Conventional LoS prediction models with limited covariates and nonstandardized data have limited reproducibility when applied to the general population.
In this study, we developed and validated a machine learning (ML)-based LoS prediction model for planned admissions using the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM).
Retrospective patient-level prediction models used electronic health record (EHR) data converted to the OMOP CDM (version 5.3) from Seoul National University Bundang Hospital (SNUBH) in South Korea. The study included 137,437 hospital admission episodes between January 2016 and December 2020. Covariates from the patient, condition occurrence, medication, observation, measurement, procedure, and visit occurrence tables were included in the analysis. To perform feature selection, we applied Lasso regularization in the logistic regression. The primary outcome was an LoS of 7 days or longer, while the secondary outcome was an LoS of 3 days or longer. The prediction models were developed using 6 ML algorithms, with the training and test set split in a 7:3 ratio. The performance of each model was evaluated based on the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). Shapley Additive Explanations (SHAP) analysis measured feature importance, while calibration plots assessed the reliability of the prediction models. External validation of the developed models occurred at an independent institution, the Seoul National University Hospital.
The final sample included 129,938 patient entry events in the planned admissions. The Extreme Gradient Boosting (XGB) model achieved the best performance in binary classification for predicting an LoS of 7 days or longer, with an AUROC of 0.891 (95% CI 0.887-0.894) and an AUPRC of 0.819 (95% CI 0.813-0.826) on the internal test set. The Light Gradient Boosting (LGB) model performed the best in the multiclassification for predicting an LoS of 3 days or more, with an AUROC of 0.901 (95% CI 0.898-0.904) and an AUPRC of 0.770 (95% CI 0.762-0.779). The most important features contributing to the models were the operation performed, frequency of previous outpatient visits, patient admission department, age, and day of admission. The RF model showed robust performance in the external validation set, achieving an AUROC of 0.804 (95% CI 0.802-0.807).
The use of the OMOP CDM in predicting hospital LoS for planned admissions demonstrates promising predictive capabilities for stays of varying durations. It underscores the advantage of standardized data in achieving reproducible results. This approach should serve as a model for enhancing operational efficiency and patient care coordination across health care settings.
准确预测医院住院时间(LoS)有助于实现资源的有效管理。当应用于普通人群时,传统的LoS 预测模型由于其有限的协变量和非标准化数据,其可重复性有限。
本研究旨在利用 Observational Medical Outcomes Partnership 通用数据模型(OMOP CDM)开发和验证一种基于机器学习(ML)的计划入院患者的 LoS 预测模型。
回顾性患者水平预测模型使用来自韩国首尔国立大学盆唐医院(SNUBH)的电子健康记录(EHR)数据,这些数据已转换为 OMOP CDM(版本 5.3)。研究包括 2016 年 1 月至 2020 年 12 月期间的 137437 例住院病例。分析中包括来自患者、疾病发生、药物、观察、测量、程序和就诊发生表的协变量。为了进行特征选择,我们在逻辑回归中应用了 Lasso 正则化。主要结局是 LOS 为 7 天或更长,次要结局是 LOS 为 3 天或更长。使用 6 种 ML 算法开发预测模型,训练集和测试集的比例为 7:3。根据受试者工作特征曲线下面积(AUROC)和精度-召回曲线下面积(AUPRC)评估每个模型的性能。Shapley Additive Explanations(SHAP)分析衡量特征的重要性,而校准图评估预测模型的可靠性。在独立机构首尔国立大学医院进行了开发模型的外部验证。
最终样本包括计划入院的 129938 例患者入院事件。在预测 7 天或更长的 LOS 的二进制分类中,极端梯度提升(XGB)模型的表现最佳,内部测试集的 AUROC 为 0.891(95%CI 0.887-0.894),AUPRC 为 0.819(95%CI 0.813-0.826)。在预测 3 天或更长的 LOS 的多类分类中,Light Gradient Boosting(LGB)模型的表现最佳,AUROC 为 0.901(95%CI 0.898-0.904),AUPRC 为 0.770(95%CI 0.762-0.779)。对模型贡献最大的特征是手术、门诊就诊频率、患者入院科室、年龄和入院日。RF 模型在外部验证集中表现稳健,AUROC 为 0.804(95%CI 0.802-0.807)。
使用 OMOP CDM 预测计划入院患者的医院 LOS 显示出对不同持续时间的住院具有有前景的预测能力。这突显了标准化数据在实现可重复结果方面的优势。这种方法应该成为提高医疗保健环境中运营效率和患者护理协调的典范。