Suppr超能文献

从多个电子健康记录数据库中进行分布式学习:用于医疗事件的上下文嵌入模型。

Distributed learning from multiple EHR databases: Contextual embedding models for medical events.

机构信息

Emory University, Department of Biostatistics and Bioinformatics, Atlanta, GA 30332, USA.

University of Texas, Health Science Center at Houston, School of Biomedical Informatics, Houston, TX 77030, USA.

出版信息

J Biomed Inform. 2019 Apr;92:103138. doi: 10.1016/j.jbi.2019.103138. Epub 2019 Feb 27.

Abstract

Electronic health record (EHR) data provide promising opportunities to explore personalized treatment regimes and to make clinical predictions. Compared with regular clinical data, EHR data are known for their irregularity and complexity. In addition, analyzing EHR data involves privacy issues and sharing such data is often infeasible among multiple research sites due to regulatory and other hurdles. A recently published work uses contextual embedding models and successfully builds one predictive model for more than seventy common diagnoses. Despite of the high predictive power, the model cannot be generalized to other institutions without sharing data. In this work, a novel method is proposed to learn from multiple databases and build predictive models based on Distributed Noise Contrastive Estimation (Distributed NCE). We use differential privacy to safeguard the intermediary information sharing. The numerical study with a real dataset demonstrates that the proposed method not only can build predictive models in a distributed manner with privacy protection, but also preserve model structure well and achieve comparable prediction accuracy. The proposed methods have been implemented as a stand-alone Python library and the implementation is available on Github (https://github.com/ziyili20/DistributedLearningPredictor) with installation instructions and use-cases.

摘要

电子健康记录 (EHR) 数据为探索个性化治疗方案和进行临床预测提供了有前景的机会。与常规临床数据相比,EHR 数据以其不规则性和复杂性而著称。此外,分析 EHR 数据涉及隐私问题,由于监管和其他障碍,通常难以在多个研究站点之间共享此类数据。最近发表的一项工作使用上下文嵌入模型成功地为 70 多种常见诊断构建了一个预测模型。尽管预测能力很高,但如果不共享数据,该模型无法推广到其他机构。在这项工作中,提出了一种新的方法,用于从多个数据库中学习并基于分布式噪声对比估计 (Distributed NCE) 构建预测模型。我们使用差分隐私来保护中间信息共享。使用真实数据集的数值研究表明,所提出的方法不仅可以在具有隐私保护的分布式方式下构建预测模型,而且可以很好地保留模型结构,并实现可比的预测准确性。所提出的方法已作为独立的 Python 库实现,并可在 Github(https://github.com/ziyili20/DistributedLearningPredictor)上获得,其中包含安装说明和用例。

相似文献

引用本文的文献

6
Differential privacy in health research: A scoping review.健康研究中的差分隐私:范围综述。
J Am Med Inform Assoc. 2021 Sep 18;28(10):2269-2276. doi: 10.1093/jamia/ocab135.
8
Federated Learning for Healthcare Informatics.医疗信息学中的联邦学习
J Healthc Inform Res. 2021;5(1):1-19. doi: 10.1007/s41666-020-00082-4. Epub 2020 Nov 12.

本文引用的文献

3
Joint Learning of Representations of Medical Concepts and Words from EHR Data.基于电子健康记录数据的医学概念与词汇表示的联合学习
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2017 Nov;2017:764-769. doi: 10.1109/BIBM.2017.8217752. Epub 2017 Dec 18.
6
Learning Low-Dimensional Representations of Medical Concepts.学习医学概念的低维表示。
AMIA Jt Summits Transl Sci Proc. 2016 Jul 20;2016:41-50. eCollection 2016.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验