Suppr超能文献

通过整合MIMIC-III记录中的异构数据改进住院时间预测

Improving Hospital Length of Stay Prediction through Heterogeneous Data Integration from MIMIC-III Records.

作者信息

Al Musawi Ahmad F, Rana Pratip, Raha Sibtanu, Sleeman William C, Kapoor Rishabh, Ghosh Preetam

机构信息

Department of Information Technology, University of Thi Qar, Nasiriyah, Thi Qar, Iraq.

Computer Science Department, Virginia Commonwealth University, Richmond, VA, USA.

出版信息

Res Sq. 2025 Aug 26:rs.3.rs-6753896. doi: 10.21203/rs.3.rs-6753896/v1.

Abstract

Accurate prediction of hospital length of stay (LoS) is a vital component in optimizing clinical workflows, resource allocation, and patient care. This study presents a comprehensive evaluation of machine learning models for both binary and multi-class LoS classification tasks using structured clinical variables, physiological measurements, and unstructured clinical notes. Seven data configurations were constructed from combinations of structured features (Z), including diagnoses, procedures, medications, laboratory tests, and microbiology results; MeSH-based symptoms (S); physiological signals (F); and textual representations (E): Z, F, E, ZS, ZSF, ZSE, and ZSEF. Five predictive models-Artificial Neural Networks (ANN), XGBoost, Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM)-were applied, with and without feature selection, where categorical features and Bag-of-Words representations were reduced to varied dimensions. Results indicate that the base structured feature set (Z) alone yields strong predictive performance across tasks. Moreover, the integration of additional data types-S, F, and E-either individually or in combination, consistently enhanced performance, with the ZSEF configuration achieving the highest F1-scores and AUC values in most cases. While the application of SMOTE did not yield substantial improvements in the global setting encompassing all hospital admissions, it demonstrated enhanced performance in disease-specific cohorts, particularly for patients admitted with lung cancer. Among the evaluated models, XGBoost and ANN demonstrated superior generalizability. These findings underscore the effectiveness of multimodal data integration and feature reduction techniques in advancing predictive modeling for hospital length of stay across diverse patient populations.

摘要

准确预测住院时间(LoS)是优化临床工作流程、资源分配和患者护理的重要组成部分。本研究使用结构化临床变量、生理测量数据和非结构化临床记录,对用于二元和多类LoS分类任务的机器学习模型进行了全面评估。通过组合结构化特征(Z)构建了七种数据配置,结构化特征包括诊断、手术、药物、实验室检查和微生物学结果;基于医学主题词表(MeSH)的症状(S);生理信号(F);以及文本表示(E):Z、F、E、ZS、ZSF、ZSE和ZSEF。应用了五种预测模型——人工神经网络(ANN)、XGBoost、逻辑回归(LR)、随机森林(RF)和支持向量机(SVM),有无特征选择,其中分类特征和词袋表示被缩减到不同维度。结果表明,仅基础结构化特征集(Z)在各项任务中就具有很强的预测性能。此外,单独或组合添加其他数据类型——S、F和E——持续提高了性能,在大多数情况下,ZSEF配置的F1分数和AUC值最高。虽然在涵盖所有住院患者的全局设置中应用合成少数过采样技术(SMOTE)并没有带来实质性改善,但它在特定疾病队列中表现出更好的性能,特别是对于肺癌住院患者。在评估的模型中,XGBoost和ANN表现出卓越的通用性。这些发现强调了多模态数据集成和特征缩减技术在推进针对不同患者群体的住院时间预测建模方面的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c5/12408017/2650adbc4def/nihpp-rs6753896v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验