• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

EMR-LIP:一个用于标准化电子病历中纵向不规则数据预处理的轻量级框架。

EMR-LIP: A lightweight framework for standardizing the preprocessing of longitudinal irregular data in electronic medical records.

作者信息

Luo Jiawei, Huang Shixin, Lan Lan, Yang Shu, Cao Tingqian, Yin Jin, Qiu Jiajun, Yang Xiaoyan, Guo Yingqiang, Zhou Xiaobo

机构信息

Department of Cardiovascular Surgery and West China Biomedical Big Data Center, West China Hospital/West China School of Medicine, Sichuan University, Chengdu, Sichuan, 610041, China; Med-X Center for Informatics, Sichuan University, Chengdu, 610041, China.

Department of Scientific Research, The People's Hospital of Yubei District of Chongqing, Chongqing, 401120, China; School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.

出版信息

Comput Methods Programs Biomed. 2025 Feb;259:108521. doi: 10.1016/j.cmpb.2024.108521. Epub 2024 Nov 24.

DOI:10.1016/j.cmpb.2024.108521
PMID:39615196
Abstract

OBJECTIVE

Longitudinal data from Electronic Medical Records (EMRs) are increasingly utilized to construct predictive models for various clinical tasks, offering enhanced insights into patient health. However, significant discrepancies exist in preprocessing the irregular and intricate EMR data across studies due to the absence of universally accepted tools and standardization methods. This study introduces the Electronic Medical Record Longitudinal Irregular Data Preprocessing (EMR-LIP) framework, a lightweight approach for optimizing the preprocessing of longitudinal, irregular EMR data, aiming to enhance research efficiency, consistency, reproducibility, and comparability.

MATERIALS AND METHODS

EMR-LIP modularizes the preprocessing of longitudinal irregular EMR data, offering tools with a low level of encapsulation. Compared to other pipelines, EMR-LIP categorizes variables in a more granular manner, designing specific preprocessing techniques for each type. To demonstrate its versatility, EMR-LIP was applied in an empirical study to two public EMR databases, MIMIC-IV and eICU-CRD. Data processed with EMR-LIP was then used to test several renowned deep learning models on a range of commonly used benchmark tasks.

RESULTS

In both the MIMIC-IV and eICU-CRD databases, models based on EMR-LIP showed superior baseline performance compared to previous studies. Interestingly, using data preprocessed by EMR-LIP, traditional models such as LSTM and GRU outperformed more complex models, achieving an AUROC of up to 0.94 for in-hospital death prediction. Additionally, models based on EMR-LIP showed stable performance across various resampling intervals and exhibited better fairness in performance across different ethnic groups.

CONCLUSION

EMR-LIP streamlines the preprocessing of irregular longitudinal EMR data, offering an end-to-end solution for model-ready data creation, and has been open-sourced for collaborative refinement by the research community.

摘要

目的

电子病历(EMR)的纵向数据越来越多地用于构建各种临床任务的预测模型,从而能更深入地了解患者健康状况。然而,由于缺乏普遍认可的工具和标准化方法,不同研究在对不规则且复杂的EMR数据进行预处理时存在显著差异。本研究介绍了电子病历纵向不规则数据预处理(EMR-LIP)框架,这是一种轻量级方法,用于优化纵向不规则EMR数据的预处理,旨在提高研究效率、一致性、可重复性和可比性。

材料与方法

EMR-LIP将纵向不规则EMR数据的预处理模块化,提供低封装级别的工具。与其他流程相比,EMR-LIP对变量进行更细致的分类,为每种类型设计特定的预处理技术。为证明其通用性,EMR-LIP在一项实证研究中应用于两个公共EMR数据库,即MIMIC-IV和eICU-CRD。然后,使用EMR-LIP处理的数据在一系列常用基准任务上测试几个著名的深度学习模型。

结果

在MIMIC-IV和eICU-CRD数据库中,基于EMR-LIP的模型均显示出比以往研究更好的基线性能。有趣的是,使用EMR-LIP预处理的数据,LSTM和GRU等传统模型的表现优于更复杂的模型,在院内死亡预测方面的曲线下面积(AUROC)高达0.94。此外,基于EMR-LIP的模型在不同重采样间隔下表现稳定,在不同种族群体中的性能公平性更好。

结论

EMR-LIP简化了不规则纵向EMR数据的预处理,为创建适用于模型的数据提供了端到端解决方案,并且已开源供研究社区进行协作改进。

相似文献

1
EMR-LIP: A lightweight framework for standardizing the preprocessing of longitudinal irregular data in electronic medical records.EMR-LIP:一个用于标准化电子病历中纵向不规则数据预处理的轻量级框架。
Comput Methods Programs Biomed. 2025 Feb;259:108521. doi: 10.1016/j.cmpb.2024.108521. Epub 2024 Nov 24.
2
Development and Validation of a Dynamic Real-Time Risk Prediction Model for Intensive Care Units Patients Based on Longitudinal Irregular Data: Multicenter Retrospective Study.基于纵向不规则数据的重症监护病房患者动态实时风险预测模型的开发与验证:多中心回顾性研究
J Med Internet Res. 2025 Apr 23;27:e69293. doi: 10.2196/69293.
3
Democratizing EHR analyses with FIDDLE: a flexible data-driven preprocessing pipeline for structured clinical data.通过FIDDLE实现电子健康记录分析的普及:一种用于结构化临床数据的灵活的数据驱动预处理管道。
J Am Med Inform Assoc. 2020 Dec 9;27(12):1921-1934. doi: 10.1093/jamia/ocaa139.
4
Early Prediction of Cardiac Arrest in the Intensive Care Unit Using Explainable Machine Learning: Retrospective Study.使用可解释机器学习对重症监护病房中的心脏骤停进行早期预测:回顾性研究。
J Med Internet Res. 2024 Sep 17;26:e62890. doi: 10.2196/62890.
5
Identification of validated case definitions for medical conditions used in primary care electronic medical record databases: a systematic review.在初级保健电子病历数据库中用于医疗状况的已验证病例定义的识别:系统评价。
J Am Med Inform Assoc. 2018 Nov 1;25(11):1567-1578. doi: 10.1093/jamia/ocy094.
6
Lazy Resampling: Fast and information preserving preprocessing for deep learning.懒惰重采样:深度学习的快速且信息保持预处理方法。
Comput Methods Programs Biomed. 2024 Dec;257:108422. doi: 10.1016/j.cmpb.2024.108422. Epub 2024 Sep 19.
7
LSTM-Based Prediction Model for Tuberculosis Among HIV-Infected Patients Using Structured Electronic Medical Records: A Retrospective Machine Learning Study.基于长短期记忆网络的使用结构化电子病历预测艾滋病毒感染患者结核病的模型:一项回顾性机器学习研究
J Multidiscip Healthc. 2024 Jul 23;17:3557-3573. doi: 10.2147/JMDH.S467877. eCollection 2024.
8
Illustrating the patient journey through the care continuum: Leveraging structured primary care electronic medical record (EMR) data in Ontario, Canada using chronic obstructive pulmonary disease as a case study.通过医疗连续护理来展示患者的就医过程:以加拿大安大略省的慢性阻塞性肺病为例,利用结构化的初级保健电子病历 (EMR) 数据。
Int J Med Inform. 2020 Aug;140:104159. doi: 10.1016/j.ijmedinf.2020.104159. Epub 2020 May 19.
9
The Environmental Impacts of Electronic Medical Records Versus Paper Records at a Large Eye Hospital in India: Life Cycle Assessment Study.印度一家大型眼科医院电子病历与纸质病历的环境影响:生命周期评估研究
J Med Internet Res. 2024 Feb 6;26:e42140. doi: 10.2196/42140.
10
A model to measure self-assessed proficiency in electronic medical records: Validation using maturity survey data from Canadian community-based physicians.一种用于测量电子病历自我评估能力的模型:使用来自加拿大社区医生的成熟度调查数据进行验证。
Int J Med Inform. 2020 Sep;141:104218. doi: 10.1016/j.ijmedinf.2020.104218. Epub 2020 Jun 10.

引用本文的文献

1
Association between serum glucose potassium ratio and short- and long-term all-cause mortality in patients with sepsis admitted to the intensive care unit: a retrospective analysis based on the MIMIC-IV database.重症监护病房脓毒症患者血清葡萄糖钾比值与短期和长期全因死亡率的关联:基于MIMIC-IV数据库的回顾性分析
Front Endocrinol (Lausanne). 2025 Jul 30;16:1555082. doi: 10.3389/fendo.2025.1555082. eCollection 2025.