• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从多个电子健康记录数据库中进行分布式学习:用于医疗事件的上下文嵌入模型。

Distributed learning from multiple EHR databases: Contextual embedding models for medical events.

机构信息

Emory University, Department of Biostatistics and Bioinformatics, Atlanta, GA 30332, USA.

University of Texas, Health Science Center at Houston, School of Biomedical Informatics, Houston, TX 77030, USA.

出版信息

J Biomed Inform. 2019 Apr;92:103138. doi: 10.1016/j.jbi.2019.103138. Epub 2019 Feb 27.

DOI:10.1016/j.jbi.2019.103138
PMID:30825539
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6533615/
Abstract

Electronic health record (EHR) data provide promising opportunities to explore personalized treatment regimes and to make clinical predictions. Compared with regular clinical data, EHR data are known for their irregularity and complexity. In addition, analyzing EHR data involves privacy issues and sharing such data is often infeasible among multiple research sites due to regulatory and other hurdles. A recently published work uses contextual embedding models and successfully builds one predictive model for more than seventy common diagnoses. Despite of the high predictive power, the model cannot be generalized to other institutions without sharing data. In this work, a novel method is proposed to learn from multiple databases and build predictive models based on Distributed Noise Contrastive Estimation (Distributed NCE). We use differential privacy to safeguard the intermediary information sharing. The numerical study with a real dataset demonstrates that the proposed method not only can build predictive models in a distributed manner with privacy protection, but also preserve model structure well and achieve comparable prediction accuracy. The proposed methods have been implemented as a stand-alone Python library and the implementation is available on Github (https://github.com/ziyili20/DistributedLearningPredictor) with installation instructions and use-cases.

摘要

电子健康记录 (EHR) 数据为探索个性化治疗方案和进行临床预测提供了有前景的机会。与常规临床数据相比,EHR 数据以其不规则性和复杂性而著称。此外,分析 EHR 数据涉及隐私问题,由于监管和其他障碍,通常难以在多个研究站点之间共享此类数据。最近发表的一项工作使用上下文嵌入模型成功地为 70 多种常见诊断构建了一个预测模型。尽管预测能力很高,但如果不共享数据,该模型无法推广到其他机构。在这项工作中,提出了一种新的方法,用于从多个数据库中学习并基于分布式噪声对比估计 (Distributed NCE) 构建预测模型。我们使用差分隐私来保护中间信息共享。使用真实数据集的数值研究表明,所提出的方法不仅可以在具有隐私保护的分布式方式下构建预测模型,而且可以很好地保留模型结构,并实现可比的预测准确性。所提出的方法已作为独立的 Python 库实现,并可在 Github(https://github.com/ziyili20/DistributedLearningPredictor)上获得,其中包含安装说明和用例。

相似文献

1
Distributed learning from multiple EHR databases: Contextual embedding models for medical events.从多个电子健康记录数据库中进行分布式学习:用于医疗事件的上下文嵌入模型。
J Biomed Inform. 2019 Apr;92:103138. doi: 10.1016/j.jbi.2019.103138. Epub 2019 Feb 27.
2
Automated feature selection of predictors in electronic medical records data.电子病历数据中预测指标的自动特征选择
Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2.
3
Distributed clinical data sharing via dynamic access-control policy transformation.通过动态访问控制策略转换实现分布式临床数据共享。
Int J Med Inform. 2016 May;89:25-31. doi: 10.1016/j.ijmedinf.2016.02.002. Epub 2016 Feb 12.
4
Robust-ODAL: Learning from heterogeneous health systems without sharing patient-level data.鲁棒性 ODAL:在不共享患者级数据的情况下从异构健康系统中学习。
Pac Symp Biocomput. 2020;25:695-706.
5
Efficient Privacy-Preserving Access Control Scheme in Electronic Health Records System.电子健康记录系统中的高效隐私保护访问控制方案。
Sensors (Basel). 2018 Oct 18;18(10):3520. doi: 10.3390/s18103520.
6
A Distributed Ensemble Approach for Mining Healthcare Data under Privacy Constraints.一种隐私约束下挖掘医疗保健数据的分布式集成方法。
Inf Sci (N Y). 2016 Feb 10;330:245-259. doi: 10.1016/j.ins.2015.10.011.
7
Prediction task guided representation learning of medical codes in EHR.基于预测任务的电子健康记录中医疗编码的表示学习。
J Biomed Inform. 2018 Aug;84:1-10. doi: 10.1016/j.jbi.2018.06.013. Epub 2018 Jun 19.
8
Adverse Drug Event Detection from Electronic Health Records Using Hierarchical Recurrent Neural Networks with Dual-Level Embedding.基于具有双层嵌入的层次递归神经网络从电子健康记录中检测药物不良反应。
Drug Saf. 2019 Jan;42(1):113-122. doi: 10.1007/s40264-018-0765-9.
9
Privacy-Preserving Predictive Modeling: Harmonization of Contextual Embeddings From Different Sources.隐私保护预测建模:不同来源上下文嵌入的协调
JMIR Med Inform. 2018 May 16;6(2):e33. doi: 10.2196/medinform.9455.
10
ODAL: A one-shot distributed algorithm to perform logistic regressions on electronic health records data from multiple clinical sites.ODAL:一种用于对来自多个临床站点的电子健康记录数据进行逻辑回归的一次性分布式算法。
Pac Symp Biocomput. 2019;24:30-41.

引用本文的文献

1
ARCH: Large-scale knowledge graph via aggregated narrative codified health records analysis.ARCH:通过汇总叙述性编码健康记录分析构建大规模知识图谱
J Biomed Inform. 2025 Feb;162:104761. doi: 10.1016/j.jbi.2024.104761. Epub 2025 Jan 23.
2
Accommodating time-varying heterogeneity in risk estimation under the Cox model: a transfer learning approach.在Cox模型下的风险估计中考虑时变异质性:一种迁移学习方法。
J Am Stat Assoc. 2023;118(544):2276-2287. doi: 10.1080/01621459.2023.2210336. Epub 2023 Jun 26.
3
Improving the Performance of Outcome Prediction for Inpatients With Acute Myocardial Infarction Based on Embedding Representation Learned From Electronic Medical Records: Development and Validation Study.基于电子病历中学习到的嵌入表示来提高急性心肌梗死住院患者结局预测的性能:开发和验证研究。
J Med Internet Res. 2022 Aug 3;24(8):e37486. doi: 10.2196/37486.
4
SMART COVID Navigator, a Clinical Decision Support Tool for COVID-19 Treatment: Design and Development Study.SMART COVID 导航器,一种用于 COVID-19 治疗的临床决策支持工具:设计与开发研究。
J Med Internet Res. 2022 Feb 18;24(2):e29279. doi: 10.2196/29279.
5
Contrastive learning improves critical event prediction in COVID-19 patients.对比学习可改善对新冠肺炎患者关键事件的预测。
Patterns (N Y). 2021 Dec 10;2(12):100389. doi: 10.1016/j.patter.2021.100389. Epub 2021 Oct 25.
6
Differential privacy in health research: A scoping review.健康研究中的差分隐私:范围综述。
J Am Med Inform Assoc. 2021 Sep 18;28(10):2269-2276. doi: 10.1093/jamia/ocab135.
7
Deep representation learning of patient data from Electronic Health Records (EHR): A systematic review.电子健康记录(EHR)中患者数据的深度表征学习:一项系统综述。
J Biomed Inform. 2021 Mar;115:103671. doi: 10.1016/j.jbi.2020.103671. Epub 2020 Dec 31.
8
Federated Learning for Healthcare Informatics.医疗信息学中的联邦学习
J Healthc Inform Res. 2021;5(1):1-19. doi: 10.1007/s41666-020-00082-4. Epub 2020 Nov 12.
9
Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data.医学中的联邦学习:在不共享患者数据的情况下促进多机构合作。
Sci Rep. 2020 Jul 28;10(1):12598. doi: 10.1038/s41598-020-69250-1.
10
Patient Representation Transfer Learning from Clinical Notes based on Hierarchical Attention Network.基于分层注意力网络的临床笔记患者表示迁移学习
AMIA Jt Summits Transl Sci Proc. 2020 May 30;2020:597-606. eCollection 2020.

本文引用的文献

1
Privacy-Preserving Predictive Modeling: Harmonization of Contextual Embeddings From Different Sources.隐私保护预测建模:不同来源上下文嵌入的协调
JMIR Med Inform. 2018 May 16;6(2):e33. doi: 10.2196/medinform.9455.
2
Privacy-Preserving Patient Similarity Learning in a Federated Environment: Development and Analysis.联邦环境下的隐私保护患者相似度学习:开发与分析
JMIR Med Inform. 2018 Apr 13;6(2):e20. doi: 10.2196/medinform.7744.
3
Joint Learning of Representations of Medical Concepts and Words from EHR Data.基于电子健康记录数据的医学概念与词汇表示的联合学习
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2017 Nov;2017:764-769. doi: 10.1109/BIBM.2017.8217752. Epub 2017 Dec 18.
4
Mapping Patient Trajectories using Longitudinal Extraction and Deep Learning in the MIMIC-III Critical Care Database.在MIMIC-III重症监护数据库中使用纵向提取和深度学习绘制患者轨迹
Pac Symp Biocomput. 2018;23:123-132.
5
A Predictive Model for Medical Events Based on Contextual Embedding of Temporal Sequences.一种基于时间序列上下文嵌入的医疗事件预测模型。
JMIR Med Inform. 2016 Nov 25;4(4):e39. doi: 10.2196/medinform.5977.
6
Learning Low-Dimensional Representations of Medical Concepts.学习医学概念的低维表示。
AMIA Jt Summits Transl Sci Proc. 2016 Jul 20;2016:41-50. eCollection 2016.
7
Using recurrent neural network models for early detection of heart failure onset.使用循环神经网络模型进行心力衰竭发作的早期检测。
J Am Med Inform Assoc. 2017 Mar 1;24(2):361-370. doi: 10.1093/jamia/ocw112.
8
MIMIC-III, a freely accessible critical care database.MIMIC-III,一个免费获取的重症监护数据库。
Sci Data. 2016 May 24;3:160035. doi: 10.1038/sdata.2016.35.
9
Mining Recent Temporal Patterns for Event Detection in Multivariate Time Series Data.挖掘多元时间序列数据中用于事件检测的近期时间模式。
KDD. 2012;2012:280-288. doi: 10.1145/2339530.2339578.
10
Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data.使用无监督特征学习在嘈杂、稀疏和不规则的临床数据上进行计算表型发现。
PLoS One. 2013 Jun 24;8(6):e66341. doi: 10.1371/journal.pone.0066341. Print 2013.