Harvard Medical School, Boston, Massachusetts, USA.
Massachusetts General Hospital, Boston, Massachusetts, USA.
J Am Med Inform Assoc. 2021 Mar 1;28(3):559-568. doi: 10.1093/jamia/ocaa215.
Due to a complex set of processes involved with the recording of health information in the Electronic Health Records (EHRs), the truthfulness of EHR diagnosis records is questionable. We present a computational approach to estimate the probability that a single diagnosis record in the EHR reflects the true disease.
Using EHR data on 18 diseases from the Mass General Brigham (MGB) Biobank, we develop generative classifiers on a small set of disease-agnostic features from EHRs that aim to represent Patients, pRoviders, and their Interactions within the healthcare SysteM (PRISM features).
We demonstrate that PRISM features and the generative PRISM classifiers are potent for estimating disease probabilities and exhibit generalizable and transferable distributional characteristics across diseases and patient populations. The joint probabilities we learn about diseases through the PRISM features via PRISM generative models are transferable and generalizable to multiple diseases.
The Generative Transfer Learning (GTL) approach with PRISM classifiers enables the scalable validation of computable phenotypes in EHRs without the need for domain-specific knowledge about specific disease processes.
Probabilities computed from the generative PRISM classifier can enhance and accelerate applied Machine Learning research and discoveries with EHR data.
由于电子健康记录(EHR)中涉及到一系列复杂的健康信息记录过程,EHR 诊断记录的真实性值得怀疑。我们提出了一种计算方法来估计 EHR 中单个诊断记录反映真实疾病的概率。
我们使用来自 Mass General Brigham(MGB)生物库的 18 种疾病的 EHR 数据,开发了一种基于 EHR 中一组与疾病无关的特征的生成分类器,这些特征旨在代表患者、提供者及其在医疗保健系统中的交互(PRISM 特征)。
我们证明了 PRISM 特征和生成的 PRISM 分类器非常适合估计疾病概率,并且在疾病和患者群体中表现出可推广和可转移的分布特征。我们通过 PRISM 生成模型从 PRISM 特征中学习到的关于疾病的联合概率可以转移和推广到多种疾病。
使用 PRISM 分类器的生成式迁移学习(GTL)方法使 EHR 中可计算表型的可扩展验证成为可能,而无需特定于特定疾病过程的领域知识。
从生成 PRISM 分类器计算出的概率可以增强和加速使用 EHR 数据进行应用机器学习研究和发现。