将结构化和非结构化数据结合用于预测模型：一种深度学习方法。

Combining structured and unstructured data for predictive models: a deep learning approach.

机构信息

Department of Biomedical Informatics, The Ohio State University, 1800 Cannon Drive, Columbus, OH, 43210, USA.

School of Computer Science and Technology, Wuhan University of Technology, Wuhan, 430070, Hubei, China.

出版信息

BMC Med Inform Decis Mak. 2020 Oct 29;20(1):280. doi: 10.1186/s12911-020-01297-6.

DOI:10.1186/s12911-020-01297-6

PMID:33121479

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7596962/

Abstract

BACKGROUND

The broad adoption of electronic health records (EHRs) provides great opportunities to conduct health care research and solve various clinical problems in medicine. With recent advances and success, methods based on machine learning and deep learning have become increasingly popular in medical informatics. However, while many research studies utilize temporal structured data on predictive modeling, they typically neglect potentially valuable information in unstructured clinical notes. Integrating heterogeneous data types across EHRs through deep learning techniques may help improve the performance of prediction models.

METHODS

In this research, we proposed 2 general-purpose multi-modal neural network architectures to enhance patient representation learning by combining sequential unstructured notes with structured data. The proposed fusion models leverage document embeddings for the representation of long clinical note documents and either convolutional neural network or long short-term memory networks to model the sequential clinical notes and temporal signals, and one-hot encoding for static information representation. The concatenated representation is the final patient representation which is used to make predictions.

RESULTS

We evaluate the performance of proposed models on 3 risk prediction tasks (i.e. in-hospital mortality, 30-day hospital readmission, and long length of stay prediction) using derived data from the publicly available Medical Information Mart for Intensive Care III dataset. Our results show that by combining unstructured clinical notes with structured data, the proposed models outperform other models that utilize either unstructured notes or structured data only.

CONCLUSIONS

The proposed fusion models learn better patient representation by combining structured and unstructured data. Integrating heterogeneous data types across EHRs helps improve the performance of prediction models and reduce errors.

摘要

背景

电子健康记录（EHR）的广泛采用为医疗保健研究和解决医学中的各种临床问题提供了巨大的机会。随着最近的进步和成功，基于机器学习和深度学习的方法在医学信息学中变得越来越流行。然而，尽管许多研究都利用预测建模的时间结构化数据，但它们通常忽略了来自非结构化临床记录中的潜在有价值信息。通过深度学习技术整合 EHR 中的异构数据类型可能有助于提高预测模型的性能。

方法

在这项研究中，我们提出了 2 种通用的多模态神经网络架构，通过将连续的非结构化笔记与结构化数据相结合，来增强患者表示学习。所提出的融合模型利用文档嵌入来表示长的临床记录文档，并使用卷积神经网络或长短时记忆网络来对连续的临床记录和时间信号进行建模，以及使用独热编码来表示静态信息。串联表示是最终的患者表示，用于进行预测。

结果

我们使用公开的医疗信息集市强化护理 III 数据集（Medical Information Mart for Intensive Care III dataset）中的派生数据，在 3 个风险预测任务（即住院内死亡率、30 天内医院再入院率和住院时间延长预测）上评估所提出模型的性能。我们的结果表明，通过将非结构化临床笔记与结构化数据相结合，所提出的模型优于仅利用非结构化笔记或结构化数据的其他模型。

结论

所提出的融合模型通过结合结构化和非结构化数据来学习更好的患者表示。整合 EHR 中的异构数据类型有助于提高预测模型的性能并减少误差。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22fa/7596962/e371881c47fa/12911_2020_1297_Fig1_HTML.jpg

相似文献

Combining structured and unstructured data for predictive models: a deep learning approach.

BMC Med Inform Decis Mak. 2020 Oct 29;20(1):280. doi: 10.1186/s12911-020-01297-6.

Multimodal Risk Prediction with Physiological Signals, Medical Images and Clinical Notes.

medRxiv. 2023 May 26:2023.05.18.23290207. doi: 10.1101/2023.05.18.23290207.

Representation learning for clinical time series prediction tasks in electronic health records.

BMC Med Inform Decis Mak. 2019 Dec 17;19(Suppl 8):259. doi: 10.1186/s12911-019-0985-7.

Finding the best trade-off between performance and interpretability in predicting hospital length of stay using structured and unstructured data.

PLoS One. 2023 Nov 30;18(11):e0289795. doi: 10.1371/journal.pone.0289795. eCollection 2023.

ISeeU: Visually interpretable deep learning for mortality prediction inside the ICU.

J Biomed Inform. 2019 Oct;98:103269. doi: 10.1016/j.jbi.2019.103269. Epub 2019 Aug 17.

Applying interpretable deep learning models to identify chronic cough patients using EHR data.

Comput Methods Programs Biomed. 2021 Oct;210:106395. doi: 10.1016/j.cmpb.2021.106395. Epub 2021 Sep 4.

Prediction of multiclass surgical outcomes in glaucoma using multimodal deep learning based on free-text operative notes and structured EHR data.

J Am Med Inform Assoc. 2024 Jan 18;31(2):456-464. doi: 10.1093/jamia/ocad213.

Multimodal temporal-clinical note network for mortality prediction.

J Biomed Semantics. 2021 Feb 15;12(1):3. doi: 10.1186/s13326-021-00235-3.

Integrating Structured and Unstructured EHR Data for Predicting Mortality by Machine Learning and Latent Dirichlet Allocation Method.

Int J Environ Res Public Health. 2023 Feb 28;20(5):4340. doi: 10.3390/ijerph20054340.

Diagnosing post-traumatic stress disorder using electronic medical record data.

Health Informatics J. 2021 Oct-Dec;27(4):14604582211053259. doi: 10.1177/14604582211053259.

引用本文的文献

Improving Hospital Length of Stay Prediction through Heterogeneous Data Integration from MIMIC-III Records.

Res Sq. 2025 Aug 26:rs.3.rs-6753896. doi: 10.21203/rs.3.rs-6753896/v1.

Predicting 30-day hospital readmissions using ClinicalT5 with structured and unstructured electronic health records.

PLoS One. 2025 Sep 2;20(9):e0328848. doi: 10.1371/journal.pone.0328848. eCollection 2025.

Diagnostic Prediction Models for Primary Care, Based on AI and Electronic Health Records: Systematic Review.

JMIR Med Inform. 2025 Aug 22;13:e62862. doi: 10.2196/62862.

Heat syndrome types prediction of traditional Chinese medicine in acute ischemic stroke through deep learning: a pilot study.

Front Pharmacol. 2025 Aug 4;16:1601601. doi: 10.3389/fphar.2025.1601601. eCollection 2025.

Machine Learning Models to Predict Risk of Maternal Morbidity and Mortality From Electronic Medical Record Data: Scoping Review.

J Med Internet Res. 2025 Aug 14;27:e68225. doi: 10.2196/68225.

Leveraging BERT for embedding ICD codes from large scale cardiovascular EMR data to understand patient diagnostic patterns.

BMC Med Inform Decis Mak. 2025 Aug 11;25(1):300. doi: 10.1186/s12911-025-03145-x.

Artificial intelligence platform to predict children's hospital care for respiratory disease using clinical, pollution, and climatic factors.

J Glob Health. 2025 Jul 21;15:04207. doi: 10.7189/jogh.15.04207.

Implicit bias in ICU electronic health record data: measurement frequencies and missing data rates of clinical variables.

BMC Med Inform Decis Mak. 2025 Jul 1;25(1):241. doi: 10.1186/s12911-025-03058-9.

Experience of Cardiovascular and Cerebrovascular Disease Surgery Patients: Sentiment Analysis Using the Korean Bidirectional Encoder Representations from Transformers (KoBERT) Model.

JMIR Med Inform. 2025 May 30;13:e65127. doi: 10.2196/65127.

Automated Risk Prediction of Post-Stroke Adverse Mental Outcomes Using Deep Learning Methods and Sequential Data.

Bioengineering (Basel). 2025 May 14;12(5):517. doi: 10.3390/bioengineering12050517.

本文引用的文献

Predictive modeling in urgent care: a comparative study of machine learning approaches.

JAMIA Open. 2018 Jun 4;1(1):87-98. doi: 10.1093/jamiaopen/ooy011. eCollection 2018 Jul.

Scalable and accurate deep learning with electronic health records.

NPJ Digit Med. 2018 May 8;1:18. doi: 10.1038/s41746-018-0029-1. eCollection 2018.

Multitask learning and benchmarking with clinical time series data.

Sci Data. 2019 Jun 17;6(1):96. doi: 10.1038/s41597-019-0103-9.

Predictive Modeling of the Hospital Readmission Risk from Patients' Claims Data Using Machine Learning: A Case Study on COPD.

Sci Rep. 2019 Feb 20;9(1):2362. doi: 10.1038/s41598-019-39071-y.

Patient representation learning and interpretable evaluation using clinical notes.

J Biomed Inform. 2018 Aug;84:103-113. doi: 10.1016/j.jbi.2018.06.016. Epub 2018 Jul 3.

What's in a Note? Unpacking Predictive Value in Clinical Note Representations.

AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:26-34. eCollection 2018.

Benchmarking deep learning models on large healthcare datasets.

J Biomed Inform. 2018 Jul;83:112-134. doi: 10.1016/j.jbi.2018.04.007. Epub 2018 Jun 5.

Strategies for handling missing clinical data for automated surgical site infection detection from the electronic health record.

J Biomed Inform. 2017 Apr;68:112-120. doi: 10.1016/j.jbi.2017.03.009. Epub 2017 Mar 16.

Interpretable Topic Features for Post-ICU Mortality Prediction.

AMIA Annu Symp Proc. 2017 Feb 10;2016:827-836. eCollection 2016.

MIMIC-III, a freely accessible critical care database.

Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

将结构化和非结构化数据结合用于预测模型：一种深度学习方法。

Combining structured and unstructured data for predictive models: a deep learning approach.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献