• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过整合MIMIC-III记录中的异构数据改进住院时间预测

Improving Hospital Length of Stay Prediction through Heterogeneous Data Integration from MIMIC-III Records.

作者信息

Al Musawi Ahmad F, Rana Pratip, Raha Sibtanu, Sleeman William C, Kapoor Rishabh, Ghosh Preetam

机构信息

Department of Information Technology, University of Thi Qar, Nasiriyah, Thi Qar, Iraq.

Computer Science Department, Virginia Commonwealth University, Richmond, VA, USA.

出版信息

Res Sq. 2025 Aug 26:rs.3.rs-6753896. doi: 10.21203/rs.3.rs-6753896/v1.

DOI:10.21203/rs.3.rs-6753896/v1
PMID:40909770
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12408017/
Abstract

Accurate prediction of hospital length of stay (LoS) is a vital component in optimizing clinical workflows, resource allocation, and patient care. This study presents a comprehensive evaluation of machine learning models for both binary and multi-class LoS classification tasks using structured clinical variables, physiological measurements, and unstructured clinical notes. Seven data configurations were constructed from combinations of structured features (Z), including diagnoses, procedures, medications, laboratory tests, and microbiology results; MeSH-based symptoms (S); physiological signals (F); and textual representations (E): Z, F, E, ZS, ZSF, ZSE, and ZSEF. Five predictive models-Artificial Neural Networks (ANN), XGBoost, Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM)-were applied, with and without feature selection, where categorical features and Bag-of-Words representations were reduced to varied dimensions. Results indicate that the base structured feature set (Z) alone yields strong predictive performance across tasks. Moreover, the integration of additional data types-S, F, and E-either individually or in combination, consistently enhanced performance, with the ZSEF configuration achieving the highest F1-scores and AUC values in most cases. While the application of SMOTE did not yield substantial improvements in the global setting encompassing all hospital admissions, it demonstrated enhanced performance in disease-specific cohorts, particularly for patients admitted with lung cancer. Among the evaluated models, XGBoost and ANN demonstrated superior generalizability. These findings underscore the effectiveness of multimodal data integration and feature reduction techniques in advancing predictive modeling for hospital length of stay across diverse patient populations.

摘要

准确预测住院时间(LoS)是优化临床工作流程、资源分配和患者护理的重要组成部分。本研究使用结构化临床变量、生理测量数据和非结构化临床记录,对用于二元和多类LoS分类任务的机器学习模型进行了全面评估。通过组合结构化特征(Z)构建了七种数据配置,结构化特征包括诊断、手术、药物、实验室检查和微生物学结果;基于医学主题词表(MeSH)的症状(S);生理信号(F);以及文本表示(E):Z、F、E、ZS、ZSF、ZSE和ZSEF。应用了五种预测模型——人工神经网络(ANN)、XGBoost、逻辑回归(LR)、随机森林(RF)和支持向量机(SVM),有无特征选择,其中分类特征和词袋表示被缩减到不同维度。结果表明,仅基础结构化特征集(Z)在各项任务中就具有很强的预测性能。此外,单独或组合添加其他数据类型——S、F和E——持续提高了性能,在大多数情况下,ZSEF配置的F1分数和AUC值最高。虽然在涵盖所有住院患者的全局设置中应用合成少数过采样技术(SMOTE)并没有带来实质性改善,但它在特定疾病队列中表现出更好的性能,特别是对于肺癌住院患者。在评估的模型中,XGBoost和ANN表现出卓越的通用性。这些发现强调了多模态数据集成和特征缩减技术在推进针对不同患者群体的住院时间预测建模方面的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/15c88ddf46ef/nihpp-rs6753896v1-f0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/2650adbc4def/nihpp-rs6753896v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/d9a4488524e5/nihpp-rs6753896v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/826b4216ac57/nihpp-rs6753896v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/9d5360a3371b/nihpp-rs6753896v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/d27a6bfcfbc6/nihpp-rs6753896v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/c1971b9862df/nihpp-rs6753896v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/df9e5dfcc19f/nihpp-rs6753896v1-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/74ab790829c9/nihpp-rs6753896v1-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/bdbb97903b89/nihpp-rs6753896v1-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/031b6ccc2f5d/nihpp-rs6753896v1-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/15c88ddf46ef/nihpp-rs6753896v1-f0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/2650adbc4def/nihpp-rs6753896v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/d9a4488524e5/nihpp-rs6753896v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/826b4216ac57/nihpp-rs6753896v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/9d5360a3371b/nihpp-rs6753896v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/d27a6bfcfbc6/nihpp-rs6753896v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/c1971b9862df/nihpp-rs6753896v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/df9e5dfcc19f/nihpp-rs6753896v1-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/74ab790829c9/nihpp-rs6753896v1-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/bdbb97903b89/nihpp-rs6753896v1-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/031b6ccc2f5d/nihpp-rs6753896v1-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/15c88ddf46ef/nihpp-rs6753896v1-f0011.jpg

相似文献

1
Improving Hospital Length of Stay Prediction through Heterogeneous Data Integration from MIMIC-III Records.通过整合MIMIC-III记录中的异构数据改进住院时间预测
Res Sq. 2025 Aug 26:rs.3.rs-6753896. doi: 10.21203/rs.3.rs-6753896/v1.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Development of Machine Learning-based Algorithms to Predict the 2- and 5-year Risk of TKA After Tibial Plateau Fracture Treatment.基于机器学习的算法用于预测胫骨平台骨折治疗后2年和5年全膝关节置换风险的研究进展
Clin Orthop Relat Res. 2025 Mar 12. doi: 10.1097/CORR.0000000000003442.
4
Artificial intelligence for diagnosing exudative age-related macular degeneration.人工智能在渗出性年龄相关性黄斑变性诊断中的应用。
Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.
5
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
6
Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.用于预测脓毒症患者脓毒症相关肝损伤的监督式机器学习模型:基于多中心队列研究的开发与验证研究
J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733.
7
Machine learning based screening of biomarkers associated with cell death and immunosuppression of multiple life stages sepsis populations.基于机器学习对与多生命阶段脓毒症人群细胞死亡和免疫抑制相关生物标志物的筛选。
Sci Rep. 2025 Aug 19;15(1):30302. doi: 10.1038/s41598-025-14600-0.
8
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
9
Comparison of machine learning algorithms for predicting length of stay in chronic kidney disease patients.用于预测慢性肾病患者住院时间的机器学习算法比较
Comput Biol Med. 2025 Sep;196(Pt B):110825. doi: 10.1016/j.compbiomed.2025.110825. Epub 2025 Aug 4.
10
Radiomics-Based Model Using Tumor and Peritumoral Features with Semi-Supervised and Privileged Learning for Metastatic Risk Prediction in Lung Cancer: A Multi-Site Study.基于影像组学的模型:利用肿瘤及瘤周特征结合半监督和特权学习预测肺癌转移风险的多中心研究
Comput Methods Programs Biomed. 2025 Aug 20;271:109029. doi: 10.1016/j.cmpb.2025.109029.

本文引用的文献

1
A literature-based approach to predict continuous hospital length of stay in adult acute care patients using admission variables: A single university center experience.基于文献的方法,使用入院变量预测成人急性护理患者的连续住院时间:单一大学中心的经验。
Int J Med Inform. 2025 Jan;193:105678. doi: 10.1016/j.ijmedinf.2024.105678. Epub 2024 Oct 28.
2
Identifications of Similarity Metrics for Patients With Cancer: Protocol for a Scoping Review.癌症患者相似性度量指标的识别:系统评价方案。
JMIR Res Protoc. 2024 Sep 4;13:e58705. doi: 10.2196/58705.
3
Enhancing length of stay prediction by learning similarity-aware representations for hospitalized patients.
通过学习住院患者的相似性感知表示来增强住院时间预测。
Artif Intell Med. 2023 Oct;144:102660. doi: 10.1016/j.artmed.2023.102660. Epub 2023 Sep 16.
4
MIMIC-IV, a freely accessible electronic health record dataset.MIMIC-IV,一个可自由访问的电子健康记录数据集。
Sci Data. 2023 Jan 3;10(1):1. doi: 10.1038/s41597-022-01899-x.
5
An explainable machine learning framework for lung cancer hospital length of stay prediction.用于肺癌住院时间预测的可解释机器学习框架。
Sci Rep. 2022 Jan 12;12(1):607. doi: 10.1038/s41598-021-04608-7.
6
Predicting Prolonged Length of ICU Stay through Machine Learning.通过机器学习预测重症监护病房(ICU)的长期住院时间
Diagnostics (Basel). 2021 Nov 30;11(12):2242. doi: 10.3390/diagnostics11122242.
7
Patient similarity analytics for explainable clinical risk prediction.患者相似性分析用于可解释的临床风险预测。
BMC Med Inform Decis Mak. 2021 Jul 1;21(1):207. doi: 10.1186/s12911-021-01566-y.
8
Combining structured and unstructured data for predictive models: a deep learning approach.将结构化和非结构化数据结合用于预测模型:一种深度学习方法。
BMC Med Inform Decis Mak. 2020 Oct 29;20(1):280. doi: 10.1186/s12911-020-01297-6.
9
Predicting Length of Stay for Cardiovascular Hospitalizations in the Intensive Care Unit: Machine Learning Approach.预测重症监护病房中心血管疾病住院患者的住院时长:机器学习方法
Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:5442-5445. doi: 10.1109/EMBC44109.2020.9175889.
10
A Machine Learning method for relabeling arbitrary DICOM structure sets to TG-263 defined labels.一种用于将任意 DICOM 结构集重新标记为 TG-263 定义标签的机器学习方法。
J Biomed Inform. 2020 Sep;109:103527. doi: 10.1016/j.jbi.2020.103527. Epub 2020 Aug 8.