Suppr超能文献

一种使用阳性和未标记患者进行电子健康记录表型分析的最大似然方法。

A maximum likelihood approach to electronic health record phenotyping using positive and unlabeled patients.

机构信息

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

出版信息

J Am Med Inform Assoc. 2020 Jan 1;27(1):119-126. doi: 10.1093/jamia/ocz170.

Abstract

OBJECTIVE

Phenotyping patients using electronic health record (EHR) data conventionally requires labeled cases and controls. Assigning labels requires manual medical chart review and therefore is labor intensive. For some phenotypes, identifying gold-standard controls is prohibitive. We developed an accurate EHR phenotyping approach that does not require labeled controls.

MATERIALS AND METHODS

Our framework relies on a random subset of cases, which can be specified using an anchor variable that has excellent positive predictive value and sensitivity independent of predictors. We proposed a maximum likelihood approach that efficiently leverages data from the specified cases and unlabeled patients to develop logistic regression phenotyping models, and compare model performance with existing algorithms.

RESULTS

Our method outperformed the existing algorithms on predictive accuracy in Monte Carlo simulation studies, application to identify hypertension patients with hypokalemia requiring oral supplementation using a simulated anchor, and application to identify primary aldosteronism patients using real-world cases and anchor variables. Our method additionally generated consistent estimates of 2 important parameters, phenotype prevalence and the proportion of true cases that are labeled.

DISCUSSION

Upon identification of an anchor variable that is scalable and transferable to different practices, our approach should facilitate development of scalable, transferable, and practice-specific phenotyping models.

CONCLUSIONS

Our proposed approach enables accurate semiautomated EHR phenotyping with minimal manual labeling and therefore should greatly facilitate EHR clinical decision support and research.

摘要

目的

使用电子健康记录(EHR)数据对患者进行表型分析通常需要有标记的病例和对照。分配标签需要进行手动病历审查,因此劳动强度大。对于某些表型,确定金标准对照是不可行的。我们开发了一种不需要标记对照的准确 EHR 表型分析方法。

材料和方法

我们的框架依赖于病例的随机子集,可以使用具有优异阳性预测值和灵敏度的锚定变量来指定。我们提出了一种最大似然方法,该方法可以有效地利用指定病例和未标记患者的数据来开发逻辑回归表型模型,并与现有算法比较模型性能。

结果

在蒙特卡罗模拟研究中,我们的方法在预测准确性方面优于现有算法,在使用模拟锚定识别需要口服补充低钾血症的高血压患者,以及在使用真实病例和锚定变量识别原发性醛固酮增多症患者方面的应用中,我们的方法表现出色。我们的方法还生成了两个重要参数的一致估计值,即表型患病率和被标记的真实病例比例。

讨论

一旦确定了可扩展且可转移到不同实践的锚定变量,我们的方法应该有助于开发可扩展、可转移和特定于实践的表型模型。

结论

我们提出的方法能够实现准确的半自动 EHR 表型分析,只需最少的手动标记,因此应该极大地促进 EHR 临床决策支持和研究。

相似文献

2
Weakly Semi-supervised phenotyping using Electronic Health records.基于电子健康记录的弱监督表型研究
J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.
9
High-throughput phenotyping with temporal sequences.高通量表型分析与时间序列。
J Am Med Inform Assoc. 2021 Mar 18;28(4):772-781. doi: 10.1093/jamia/ocaa288.

引用本文的文献

4
Optimal sampling for positive only electronic health record data.仅阳性电子健康记录数据的最优抽样。
Biometrics. 2023 Dec;79(4):2974-2986. doi: 10.1111/biom.13824. Epub 2023 Jan 28.

本文引用的文献

5
Shared Electronic Health Record Systems: Key Legal and Security Challenges.共享电子健康记录系统:关键的法律与安全挑战
J Diabetes Sci Technol. 2017 Nov;11(6):1234-1239. doi: 10.1177/1932296817709797. Epub 2017 May 31.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验