基于生成式迁移学习的电子病历诊断记录可信度评估

Generative transfer learning for measuring plausibility of EHR diagnosis records.

机构信息

Harvard Medical School, Boston, Massachusetts, USA.

Massachusetts General Hospital, Boston, Massachusetts, USA.

出版信息

J Am Med Inform Assoc. 2021 Mar 1;28(3):559-568. doi: 10.1093/jamia/ocaa215.

DOI:10.1093/jamia/ocaa215

PMID:33043366

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7936395/

Abstract

OBJECTIVE

Due to a complex set of processes involved with the recording of health information in the Electronic Health Records (EHRs), the truthfulness of EHR diagnosis records is questionable. We present a computational approach to estimate the probability that a single diagnosis record in the EHR reflects the true disease.

MATERIALS AND METHODS

Using EHR data on 18 diseases from the Mass General Brigham (MGB) Biobank, we develop generative classifiers on a small set of disease-agnostic features from EHRs that aim to represent Patients, pRoviders, and their Interactions within the healthcare SysteM (PRISM features).

RESULTS

We demonstrate that PRISM features and the generative PRISM classifiers are potent for estimating disease probabilities and exhibit generalizable and transferable distributional characteristics across diseases and patient populations. The joint probabilities we learn about diseases through the PRISM features via PRISM generative models are transferable and generalizable to multiple diseases.

DISCUSSION

The Generative Transfer Learning (GTL) approach with PRISM classifiers enables the scalable validation of computable phenotypes in EHRs without the need for domain-specific knowledge about specific disease processes.

CONCLUSION

Probabilities computed from the generative PRISM classifier can enhance and accelerate applied Machine Learning research and discoveries with EHR data.

摘要

目的

由于电子健康记录（EHR）中涉及到一系列复杂的健康信息记录过程，EHR 诊断记录的真实性值得怀疑。我们提出了一种计算方法来估计 EHR 中单个诊断记录反映真实疾病的概率。

材料和方法

我们使用来自 Mass General Brigham（MGB）生物库的 18 种疾病的 EHR 数据，开发了一种基于 EHR 中一组与疾病无关的特征的生成分类器，这些特征旨在代表患者、提供者及其在医疗保健系统中的交互（PRISM 特征）。

结果

我们证明了 PRISM 特征和生成的 PRISM 分类器非常适合估计疾病概率，并且在疾病和患者群体中表现出可推广和可转移的分布特征。我们通过 PRISM 生成模型从 PRISM 特征中学习到的关于疾病的联合概率可以转移和推广到多种疾病。

讨论

使用 PRISM 分类器的生成式迁移学习（GTL）方法使 EHR 中可计算表型的可扩展验证成为可能，而无需特定于特定疾病过程的领域知识。

结论

从生成 PRISM 分类器计算出的概率可以增强和加速使用 EHR 数据进行应用机器学习研究和发现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/138f/7936395/76f0d9a3e962/ocaa215f1.jpg

相似文献

Generative transfer learning for measuring plausibility of EHR diagnosis records.

J Am Med Inform Assoc. 2021 Mar 1;28(3):559-568. doi: 10.1093/jamia/ocaa215.

Weakly Semi-supervised phenotyping using Electronic Health records.

J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.

Generating sequential electronic health records using dual adversarial autoencoder.

J Am Med Inform Assoc. 2020 Jul 1;27(9):1411-1419. doi: 10.1093/jamia/ocaa119.

High-throughput phenotyping with temporal sequences.

J Am Med Inform Assoc. 2021 Mar 18;28(4):772-781. doi: 10.1093/jamia/ocaa288.

Semi-supervised Double Deep Learning Temporal Risk Prediction (SeDDLeR) with Electronic Health Records.

J Biomed Inform. 2024 Sep;157:104685. doi: 10.1016/j.jbi.2024.104685. Epub 2024 Jul 14.

Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries.

J Biomed Inform. 2019 Nov;99:103310. doi: 10.1016/j.jbi.2019.103310. Epub 2019 Oct 14.

Semi-supervised learning of the electronic health record for phenotype stratification.

J Biomed Inform. 2016 Dec;64:168-178. doi: 10.1016/j.jbi.2016.10.007. Epub 2016 Oct 12.

Unsupervised machine learning for the discovery of latent disease clusters and patient subgroups using electronic health records.

J Biomed Inform. 2020 Feb;102:103364. doi: 10.1016/j.jbi.2019.103364. Epub 2019 Dec 28.

Detecting rare diseases in electronic health records using machine learning and knowledge engineering: Case study of acute hepatic porphyria.

PLoS One. 2020 Jul 2;15(7):e0235574. doi: 10.1371/journal.pone.0235574. eCollection 2020.

Automated feature selection of predictors in electronic medical records data.

Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2.

引用本文的文献

Bridging Data Gaps in Healthcare: A Scoping Review of Transfer Learning in Structured Data Analysis.

Health Data Sci. 2025 Sep 3;5:0321. doi: 10.34133/hds.0321. eCollection 2025.

From Basic to Extra Features: Hypergraph Transformer Pretrain-then-Finetuning for Balanced Clinical Predictions on EHR.

Proc Mach Learn Res. 2024 Jun;248:182-197.

Characterization of long COVID temporal sub-phenotypes by distributed representation learning from electronic health record data: a cohort study.

EClinicalMedicine. 2023 Sep 14;64:102210. doi: 10.1016/j.eclinm.2023.102210. eCollection 2023 Oct.

Electronic health record data quality assessment and tools: a systematic review.

J Am Med Inform Assoc. 2023 Sep 25;30(10):1730-1740. doi: 10.1093/jamia/ocad120.

Machine learning approaches for electronic health records phenotyping: a methodical review.

J Am Med Inform Assoc. 2023 Jan 18;30(2):367-381. doi: 10.1093/jamia/ocac216.

Enhancing PCORnet Clinical Research Network data completeness by integrating multistate insurance claims with electronic health records in a cloud environment aligned with CMS security and privacy requirements.

J Am Med Inform Assoc. 2022 Mar 15;29(4):660-670. doi: 10.1093/jamia/ocab269.

Evolving phenotypes of non-hospitalized patients that indicate long COVID.

BMC Med. 2021 Sep 27;19(1):249. doi: 10.1186/s12916-021-02115-0.

High-throughput phenotyping with temporal sequences.

J Am Med Inform Assoc. 2021 Mar 18;28(4):772-781. doi: 10.1093/jamia/ocaa288.

本文引用的文献

Polar labeling: silver standard algorithm for training disease classifiers.

Bioinformatics. 2020 May 1;36(10):3200-3206. doi: 10.1093/bioinformatics/btaa088.

High-throughput multimodal automated phenotyping (MAP) with application to PheWAS.

J Am Med Inform Assoc. 2019 Nov 1;26(11):1255-1262. doi: 10.1093/jamia/ocz066.

Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models.

Annu Rev Biomed Data Sci. 2018 Jul;1:53-68. doi: 10.1146/annurev-biodatasci-080917-013315. Epub 2018 May 23.

The Effectiveness of Multitask Learning for Phenotyping with Electronic Health Records Data.

Pac Symp Biocomput. 2019;24:18-29.

Feature extraction for phenotyping from semantic and knowledge resources.

J Biomed Inform. 2019 Mar;91:103122. doi: 10.1016/j.jbi.2019.103122. Epub 2019 Feb 7.

Biases in electronic health record data due to processes within the healthcare system: retrospective observational study.

BMJ. 2018 Apr 30;361:k1479. doi: 10.1136/bmj.k1479.

Enabling phenotypic big data with PheNorm.

J Am Med Inform Assoc. 2018 Jan 1;25(1):54-60. doi: 10.1093/jamia/ocx111.

Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network.

AMIA Jt Summits Transl Sci Proc. 2017 Jul 26;2017:48-57. eCollection 2017.

EHR-based phenotyping: Bulk learning and evaluation.

J Biomed Inform. 2017 Jun;70:35-51. doi: 10.1016/j.jbi.2017.04.009. Epub 2017 Apr 12.

A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data.

EGEMS (Wash DC). 2016 Sep 11;4(1):1244. doi: 10.13063/2327-9214.1244. eCollection 2016.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于生成式迁移学习的电子病历诊断记录可信度评估

Generative transfer learning for measuring plausibility of EHR diagnosis records.

机构信息

Harvard Medical School, Boston, Massachusetts, USA.

Massachusetts General Hospital, Boston, Massachusetts, USA.