• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一个用于电子健康记录数据端到端分析的开源框架。

An open-source framework for end-to-end analysis of electronic health record data.

机构信息

Institute of Computational Biology, Helmholtz Munich, Munich, Germany.

Institute of Lung Health and Immunity and Comprehensive Pneumology Center with the CPC-M bioArchive; Helmholtz Zentrum Munich; member of the German Center for Lung Research (DZL), Munich, Germany.

出版信息

Nat Med. 2024 Nov;30(11):3369-3380. doi: 10.1038/s41591-024-03214-0. Epub 2024 Sep 12.

DOI:10.1038/s41591-024-03214-0
PMID:39266748
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11564094/
Abstract

With progressive digitalization of healthcare systems worldwide, large-scale collection of electronic health records (EHRs) has become commonplace. However, an extensible framework for comprehensive exploratory analysis that accounts for data heterogeneity is missing. Here we introduce ehrapy, a modular open-source Python framework designed for exploratory analysis of heterogeneous epidemiology and EHR data. ehrapy incorporates a series of analytical steps, from data extraction and quality control to the generation of low-dimensional representations. Complemented by rich statistical modules, ehrapy facilitates associating patients with disease states, differential comparison between patient clusters, survival analysis, trajectory inference, causal inference and more. Leveraging ontologies, ehrapy further enables data sharing and training EHR deep learning models, paving the way for foundational models in biomedical research. We demonstrate ehrapy's features in six distinct examples. We applied ehrapy to stratify patients affected by unspecified pneumonia into finer-grained phenotypes. Furthermore, we reveal biomarkers for significant differences in survival among these groups. Additionally, we quantify medication-class effects of pneumonia medications on length of stay. We further leveraged ehrapy to analyze cardiovascular risks across different data modalities. We reconstructed disease state trajectories in patients with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) based on imaging data. Finally, we conducted a case study to demonstrate how ehrapy can detect and mitigate biases in EHR data. ehrapy, thus, provides a framework that we envision will standardize analysis pipelines on EHR data and serve as a cornerstone for the community.

摘要

随着全球医疗系统的数字化进程不断推进,大规模收集电子健康记录(EHR)已变得司空见惯。然而,目前缺乏一种可扩展的框架来进行全面的探索性分析,以考虑到数据的异质性。在这里,我们介绍 ehrapy,这是一个模块化的开源 Python 框架,专为异质流行病学和 EHR 数据的探索性分析而设计。ehrapy 包含一系列分析步骤,从数据提取和质量控制到低维表示的生成。通过丰富的统计模块进行补充,ehrapy 有助于将患者与疾病状态相关联,对患者群体进行差异比较,进行生存分析、轨迹推断、因果推断等。利用本体论,ehrapy 进一步实现了数据共享和训练 EHR 深度学习模型,为生物医学研究中的基础模型铺平了道路。我们在六个不同的示例中展示了 ehrapy 的功能。我们应用 ehrapy 将受未指明肺炎影响的患者细分为更精细的表型。此外,我们揭示了这些群体之间生存差异的生物标志物。此外,我们量化了肺炎药物对住院时间的药物类别效应。我们进一步利用 ehrapy 来分析不同数据模式下的心血管风险。我们根据成像数据重建了严重急性呼吸综合征冠状病毒 2(SARS-CoV-2)患者的疾病状态轨迹。最后,我们进行了一项案例研究,以展示 ehrapy 如何检测和减轻 EHR 数据中的偏差。因此,ehrapy 提供了一个框架,我们设想该框架将标准化 EHR 数据的分析管道,并成为社区的基石。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/32de0b1f2efe/41591_2024_3214_Fig16_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/e02e053fd398/41591_2024_3214_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/b41119e2b41d/41591_2024_3214_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/eb9c80ae1a62/41591_2024_3214_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/dc222ceffd43/41591_2024_3214_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/824d67ef7faa/41591_2024_3214_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/0288302e279b/41591_2024_3214_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/e3b0f659df25/41591_2024_3214_Fig7_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/34b9297eb62b/41591_2024_3214_Fig8_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/ffdc702a60ad/41591_2024_3214_Fig9_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/59c73f56cb3f/41591_2024_3214_Fig10_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/5cd0a7bd74cc/41591_2024_3214_Fig11_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/6b9abaa7d2ef/41591_2024_3214_Fig12_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/27557c0fe62f/41591_2024_3214_Fig13_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/96d6a9af9ba5/41591_2024_3214_Fig14_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/1e644e5867ea/41591_2024_3214_Fig15_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/32de0b1f2efe/41591_2024_3214_Fig16_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/e02e053fd398/41591_2024_3214_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/b41119e2b41d/41591_2024_3214_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/eb9c80ae1a62/41591_2024_3214_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/dc222ceffd43/41591_2024_3214_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/824d67ef7faa/41591_2024_3214_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/0288302e279b/41591_2024_3214_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/e3b0f659df25/41591_2024_3214_Fig7_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/34b9297eb62b/41591_2024_3214_Fig8_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/ffdc702a60ad/41591_2024_3214_Fig9_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/59c73f56cb3f/41591_2024_3214_Fig10_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/5cd0a7bd74cc/41591_2024_3214_Fig11_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/6b9abaa7d2ef/41591_2024_3214_Fig12_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/27557c0fe62f/41591_2024_3214_Fig13_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/96d6a9af9ba5/41591_2024_3214_Fig14_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/1e644e5867ea/41591_2024_3214_Fig15_ESM.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b7c7/11564094/32de0b1f2efe/41591_2024_3214_Fig16_ESM.jpg

相似文献

1
An open-source framework for end-to-end analysis of electronic health record data.一个用于电子健康记录数据端到端分析的开源框架。
Nat Med. 2024 Nov;30(11):3369-3380. doi: 10.1038/s41591-024-03214-0. Epub 2024 Sep 12.
2
Multi-task heterogeneous graph learning on electronic health records.电子健康记录上的多任务异质图学习。
Neural Netw. 2024 Dec;180:106644. doi: 10.1016/j.neunet.2024.106644. Epub 2024 Aug 22.
3
Structuring, reuse and analysis of electronic dental data using the Oral Health and Disease Ontology.利用口腔健康与疾病本体对电子牙科数据进行结构化、再利用和分析。
J Biomed Semantics. 2020 Aug 20;11(1):8. doi: 10.1186/s13326-020-00222-0.
4
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
5
COVID-19 Mortality Prediction From Deep Learning in a Large Multistate Electronic Health Record and Laboratory Information System Data Set: Algorithm Development and Validation.基于大型多状态电子健康记录和实验室信息系统数据集的深度学习预测 COVID-19 死亡率:算法开发与验证。
J Med Internet Res. 2021 Sep 28;23(9):e30157. doi: 10.2196/30157.
6
Deep learning prediction models based on EHR trajectories: A systematic review.基于电子健康记录轨迹的深度学习预测模型:系统评价。
J Biomed Inform. 2023 Aug;144:104430. doi: 10.1016/j.jbi.2023.104430. Epub 2023 Jun 26.
7
Mining of EHR for interface terminology concepts for annotating EHRs of COVID patients.挖掘电子健康记录中的接口术语概念,用于注释新冠患者的电子健康记录。
BMC Med Inform Decis Mak. 2023 Feb 24;23(Suppl 1):40. doi: 10.1186/s12911-023-02136-0.
8
EHR-BERT: A BERT-based model for effective anomaly detection in electronic health records.EHR-BERT:一种基于 BERT 的电子健康记录中有效异常检测模型。
J Biomed Inform. 2024 Feb;150:104605. doi: 10.1016/j.jbi.2024.104605. Epub 2024 Feb 6.
9
Opportunities and Challenges in Using Electronic Health Record Systems to Study Postacute Sequelae of SARS-CoV-2 Infection: Insights From the NIH RECOVER Initiative.利用电子健康记录系统研究新冠病毒感染后急性后遗症的机遇与挑战:来自美国国立卫生研究院RECOVER计划的见解
J Med Internet Res. 2025 Mar 5;27:e59217. doi: 10.2196/59217.
10
Reusing routine electronic health record data for nationwide COVID-19 surveillance in nursing homes: barriers, facilitators, and lessons learned.在养老院中重新利用常规电子健康记录数据进行全国范围的新冠病毒监测:障碍、促进因素及经验教训
BMC Med Inform Decis Mak. 2024 Dec 27;24(1):408. doi: 10.1186/s12911-024-02818-3.

引用本文的文献

1
Antibiotic Resistance Microbiology Dataset (ARMD): A Resource for Antimicrobial Resistance from EHRs.抗生素耐药性微生物数据集(ARMD):电子健康记录中抗菌药物耐药性的资源。
ArXiv. 2025 Jul 21:arXiv:2503.07664v2.
2
Antibiotic Resistance Microbiology Dataset (ARMD): A Resource for Antimicrobial Resistance from EHRs.抗生素耐药性微生物数据集(ARMD):电子健康记录中抗菌药物耐药性的资源。
Sci Data. 2025 Jul 26;12(1):1299. doi: 10.1038/s41597-025-05649-7.
3
Associations Between Eating Disorders and Sociodemographic Factors in Adolescent Patients Since the Start of the COVID-19 Pandemic.

本文引用的文献

1
CellRank 2: unified fate mapping in multiview single-cell data.CellRank 2:多视图单细胞数据中的统一命运映射。
Nat Methods. 2024 Jul;21(7):1196-1205. doi: 10.1038/s41592-024-02303-9. Epub 2024 Jun 13.
2
Participant flow diagrams for health equity in AI.人工智能健康公平性的参与者流程图。
J Biomed Inform. 2024 Apr;152:104631. doi: 10.1016/j.jbi.2024.104631. Epub 2024 Mar 27.
3
Disparity dashboards: an evaluation of the literature and framework for health equity improvement.差距仪表盘:对文献的评估和改善健康公平的框架。
自新冠疫情开始以来青少年患者饮食失调与社会人口学因素之间的关联
Children (Basel). 2025 May 31;12(6):730. doi: 10.3390/children12060730.
4
A scoping review and evidence gap analysis of clinical AI fairness.临床人工智能公平性的范围综述与证据差距分析
NPJ Digit Med. 2025 Jun 14;8(1):360. doi: 10.1038/s41746-025-01667-2.
5
Deep learning-based prediction of individualized Real-time FSH doses in GnRH agonist long protocols.基于深度学习预测促性腺激素释放激素激动剂长方案中个体化实时促卵泡生成素剂量
J Transl Med. 2025 May 15;23(1):545. doi: 10.1186/s12967-025-06562-8.
6
Atlas of Cerebrospinal Fluid Immune Cells Across Neurological Diseases.神经系统疾病脑脊液免疫细胞图谱
Ann Neurol. 2025 Apr;97(4):779-790. doi: 10.1002/ana.27157. Epub 2024 Dec 12.
Lancet Digit Health. 2023 Nov;5(11):e831-e839. doi: 10.1016/S2589-7500(23)00150-4.
4
Identifying subtypes of heart failure from three electronic health record sources with machine learning: an external, prognostic, and genetic validation study.利用机器学习从三个电子健康记录来源识别心力衰竭亚型:一项外部、预后和遗传验证研究。
Lancet Digit Health. 2023 Jun;5(6):e370-e379. doi: 10.1016/S2589-7500(23)00065-1.
5
Machine learning links unresolving secondary pneumonia to mortality in patients with severe pneumonia, including COVID-19.机器学习将未解决的继发性肺炎与包括 COVID-19 在内的重症肺炎患者的死亡率联系起来。
J Clin Invest. 2023 Jun 15;133(12):e170682. doi: 10.1172/JCI170682.
6
Foundation models for generalist medical artificial intelligence.通用型医学人工智能的基础模型。
Nature. 2023 Apr;616(7956):259-265. doi: 10.1038/s41586-023-05881-4. Epub 2023 Apr 12.
7
The scverse project provides a computational ecosystem for single-cell omics data analysis.scverse项目为单细胞组学数据分析提供了一个计算生态系统。
Nat Biotechnol. 2023 May;41(5):604-606. doi: 10.1038/s41587-023-01733-8.
8
An atlas of genetic scores to predict multi-omic traits.遗传评分图谱预测多组学特征
Nature. 2023 Apr;616(7955):123-131. doi: 10.1038/s41586-023-05844-9. Epub 2023 Mar 29.
9
Atlas of plasma NMR biomarkers for health and disease in 118,461 individuals from the UK Biobank.英国生物银行 118461 人血浆 NMR 生物标志物图谱:健康与疾病。
Nat Commun. 2023 Feb 3;14(1):604. doi: 10.1038/s41467-023-36231-7.
10
Mining for equitable health: Assessing the impact of missing data in electronic health records.挖掘公平健康:评估电子健康记录中缺失数据的影响。
J Biomed Inform. 2023 Mar;139:104269. doi: 10.1016/j.jbi.2022.104269. Epub 2023 Jan 5.