一种使用阳性和未标记患者进行电子健康记录表型分析的最大似然方法。

A maximum likelihood approach to electronic health record phenotyping using positive and unlabeled patients.

机构信息

Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

出版信息

J Am Med Inform Assoc. 2020 Jan 1;27(1):119-126. doi: 10.1093/jamia/ocz170.

DOI:10.1093/jamia/ocz170

PMID:31722396

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6913222/

Abstract

OBJECTIVE

Phenotyping patients using electronic health record (EHR) data conventionally requires labeled cases and controls. Assigning labels requires manual medical chart review and therefore is labor intensive. For some phenotypes, identifying gold-standard controls is prohibitive. We developed an accurate EHR phenotyping approach that does not require labeled controls.

MATERIALS AND METHODS

Our framework relies on a random subset of cases, which can be specified using an anchor variable that has excellent positive predictive value and sensitivity independent of predictors. We proposed a maximum likelihood approach that efficiently leverages data from the specified cases and unlabeled patients to develop logistic regression phenotyping models, and compare model performance with existing algorithms.

RESULTS

Our method outperformed the existing algorithms on predictive accuracy in Monte Carlo simulation studies, application to identify hypertension patients with hypokalemia requiring oral supplementation using a simulated anchor, and application to identify primary aldosteronism patients using real-world cases and anchor variables. Our method additionally generated consistent estimates of 2 important parameters, phenotype prevalence and the proportion of true cases that are labeled.

DISCUSSION

Upon identification of an anchor variable that is scalable and transferable to different practices, our approach should facilitate development of scalable, transferable, and practice-specific phenotyping models.

CONCLUSIONS

Our proposed approach enables accurate semiautomated EHR phenotyping with minimal manual labeling and therefore should greatly facilitate EHR clinical decision support and research.

摘要

目的

使用电子健康记录（EHR）数据对患者进行表型分析通常需要有标记的病例和对照。分配标签需要进行手动病历审查，因此劳动强度大。对于某些表型，确定金标准对照是不可行的。我们开发了一种不需要标记对照的准确 EHR 表型分析方法。

材料和方法

我们的框架依赖于病例的随机子集，可以使用具有优异阳性预测值和灵敏度的锚定变量来指定。我们提出了一种最大似然方法，该方法可以有效地利用指定病例和未标记患者的数据来开发逻辑回归表型模型，并与现有算法比较模型性能。

结果

在蒙特卡罗模拟研究中，我们的方法在预测准确性方面优于现有算法，在使用模拟锚定识别需要口服补充低钾血症的高血压患者，以及在使用真实病例和锚定变量识别原发性醛固酮增多症患者方面的应用中，我们的方法表现出色。我们的方法还生成了两个重要参数的一致估计值，即表型患病率和被标记的真实病例比例。

讨论

一旦确定了可扩展且可转移到不同实践的锚定变量，我们的方法应该有助于开发可扩展、可转移和特定于实践的表型模型。

结论

我们提出的方法能够实现准确的半自动 EHR 表型分析，只需最少的手动标记，因此应该极大地促进 EHR 临床决策支持和研究。

相似文献

A maximum likelihood approach to electronic health record phenotyping using positive and unlabeled patients.一种使用阳性和未标记患者进行电子健康记录表型分析的最大似然方法。

J Am Med Inform Assoc. 2020 Jan 1;27(1):119-126. doi: 10.1093/jamia/ocz170.

Weakly Semi-supervised phenotyping using Electronic Health records.基于电子健康记录的弱监督表型研究

J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.

Automated feature selection of predictors in electronic medical records data.电子病历数据中预测指标的自动特征选择

Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2.

Testing calibration of phenotyping models using positive-only electronic health record data.仅使用阳性电子健康记录数据测试表型模型的校准。

Biostatistics. 2022 Jul 18;23(3):844-859. doi: 10.1093/biostatistics/kxab003.

sureLDA: A multidisease automated phenotyping method for the electronic health record.SureLDA：一种电子健康记录中的多疾病自动化表型方法。

J Am Med Inform Assoc. 2020 Aug 1;27(8):1235-1243. doi: 10.1093/jamia/ocaa079.

Relational machine learning for electronic health record-driven phenotyping.用于电子健康记录驱动的表型分析的关系机器学习。

J Biomed Inform. 2014 Dec;52:260-70. doi: 10.1016/j.jbi.2014.07.007. Epub 2014 Jul 15.

MixEHR-Guided: A guided multi-modal topic modeling approach for large-scale automatic phenotyping using the electronic health record.混合 EHR 引导：一种使用电子健康记录进行大规模自动表型分析的引导式多模态主题建模方法。

J Biomed Inform. 2022 Oct;134:104190. doi: 10.1016/j.jbi.2022.104190. Epub 2022 Sep 1.

Semi-supervised validation of multiple surrogate outcomes with application to electronic medical records phenotyping.多替代结局的半监督验证及其在电子病历表型分析中的应用

Biometrics. 2019 Mar;75(1):78-89. doi: 10.1111/biom.12971. Epub 2019 Mar 8.

High-throughput phenotyping with temporal sequences.高通量表型分析与时间序列。

J Am Med Inform Assoc. 2021 Mar 18;28(4):772-781. doi: 10.1093/jamia/ocaa288.

Accurately identifying incident cases of venous thromboembolism in the electronic health record: Performance of a novel phenotyping algorithm.准确识别电子健康记录中的静脉血栓栓塞事件病例：新型表型算法的性能。

Thromb Res. 2024 Nov;243:109143. doi: 10.1016/j.thromres.2024.109143. Epub 2024 Sep 7.

引用本文的文献

Estimating the prevalence of diabetic retinopathy in electronic health records with massive missing labels.在存在大量缺失标签的电子健康记录中估计糖尿病视网膜病变的患病率。

Intell Based Med. 2024;10. doi: 10.1016/j.ibmed.2024.100154. Epub 2024 Jul 5.

Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping.用于电子健康记录表型分析的先验自适应半监督学习

J Mach Learn Res. 2022;23.

Generate Analysis-Ready Data for Real-world Evidence: Tutorial for Harnessing Electronic Health Records With Advanced Informatic Technologies.为真实世界证据生成可分析数据：利用先进信息学技术驾驭电子健康记录的教程。

J Med Internet Res. 2023 May 25;25:e45662. doi: 10.2196/45662.

Optimal sampling for positive only electronic health record data.仅阳性电子健康记录数据的最优抽样。

Biometrics. 2023 Dec;79(4):2974-2986. doi: 10.1111/biom.13824. Epub 2023 Jan 28.

Machine learning approaches for electronic health records phenotyping: a methodical review.基于机器学习的电子健康记录表型分析方法：系统评价

J Am Med Inform Assoc. 2023 Jan 18;30(2):367-381. doi: 10.1093/jamia/ocac216.

Machine Learning in Causal Inference: Application in Pharmacovigilance.机器学习在因果推断中的应用：在药物警戒中的应用。

Drug Saf. 2022 May;45(5):459-476. doi: 10.1007/s40264-022-01155-6. Epub 2022 May 17.

Phenotyping coronavirus disease 2019 during a global health pandemic: Lessons learned from the characterization of an early cohort.在全球大流行期间对 2019 年冠状病毒病进行表型分析：从早期队列特征分析中获得的经验教训。

J Biomed Inform. 2021 May;117:103777. doi: 10.1016/j.jbi.2021.103777. Epub 2021 Apr 8.

A high-throughput phenotyping algorithm is portable from adult to pediatric populations.高通量表型分析算法可从成人人群移植到儿科人群。

J Am Med Inform Assoc. 2021 Jun 12;28(6):1265-1269. doi: 10.1093/jamia/ocaa343.

Phenotyping issues for exploring electronic health records to design clinical trials.探索电子健康记录以设计临床试验的表型问题。

Clin Trials. 2020 Aug;17(4):402-404. doi: 10.1177/1740774520931039. Epub 2020 Jun 10.

本文引用的文献

Semi-supervised validation of multiple surrogate outcomes with application to electronic medical records phenotyping.多替代结局的半监督验证及其在电子病历表型分析中的应用

Biometrics. 2019 Mar;75(1):78-89. doi: 10.1111/biom.12971. Epub 2019 Mar 8.

Prevalence of primary aldosteronism in primary care: a cross-sectional study.原发性醛固酮增多症在基层医疗中的流行情况：一项横断面研究。

Br J Gen Pract. 2018 Feb;68(667):e114-e122. doi: 10.3399/bjgp18X694589. Epub 2018 Jan 15.

Implementing electronic health records (EHRs): health care provider perceptions before and after transition from a local basic EHR to a commercial comprehensive EHR.实施电子健康记录 (EHR)：从本地基本 EHR 过渡到商业综合 EHR 前后医疗服务提供者的看法。

J Am Med Inform Assoc. 2018 Jun 1;25(6):618-626. doi: 10.1093/jamia/ocx094.

Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network.利用阿佛洛狄忒（APHRODITE）和观察性健康科学与信息学（OHDSI）数据网络进行电子表型分析。

AMIA Jt Summits Transl Sci Proc. 2017 Jul 26;2017:48-57. eCollection 2017.

Shared Electronic Health Record Systems: Key Legal and Security Challenges.共享电子健康记录系统：关键的法律与安全挑战

J Diabetes Sci Technol. 2017 Nov;11(6):1234-1239. doi: 10.1177/1932296817709797. Epub 2017 May 31.

Have Electronic Health Records Improved the Quality of Patient Care?电子健康记录是否提高了患者护理质量？

PM R. 2017 May;9(5S):S41-S50. doi: 10.1016/j.pmrj.2017.04.001.

Prevalence and Clinical Manifestations of Primary Aldosteronism Encountered in Primary Care Practice.原发性醛固酮增多症在基层医疗实践中的患病率和临床表现。

J Am Coll Cardiol. 2017 Apr 11;69(14):1811-1820. doi: 10.1016/j.jacc.2017.01.052.

Surrogate-assisted feature extraction for high-throughput phenotyping.用于高通量表型分析的代理辅助特征提取

J Am Med Inform Assoc. 2017 Apr 1;24(e1):e143-e149. doi: 10.1093/jamia/ocw135.

Guidelines for primary aldosteronism: uptake by primary care physicians in Europe.原发性醛固酮增多症指南：欧洲初级保健医生的接受情况

J Hypertens. 2016 Nov;34(11):2253-7. doi: 10.1097/HJH.0000000000001088.

Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review.利用电子健康记录数据开发风险预测模型的机遇与挑战：一项系统综述

J Am Med Inform Assoc. 2017 Jan;24(1):198-208. doi: 10.1093/jamia/ocw042. Epub 2016 May 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验