在存在大量缺失标签的电子健康记录中估计糖尿病视网膜病变的患病率。

Estimating the prevalence of diabetic retinopathy in electronic health records with massive missing labels.

作者信息

Liang Ye, Wang Ru, Wang Yuchen, Liu Tieming

机构信息

Department of Statistics, Oklahoma State University, Stillwater, OK, USA.

Dell Technologies, Round Rock, TX, USA.

出版信息

Intell Based Med. 2024;10. doi: 10.1016/j.ibmed.2024.100154. Epub 2024 Jul 5.

DOI:10.1016/j.ibmed.2024.100154

PMID:39717527

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11666125/

Abstract

OBJECTIVE

The paper aims to address the problem of massive unlabeled patients in electronic health records (EHR) who potentially have undiagnosed diabetic retinopathy (DR). It is desired to estimate the actual DR prevalence in EHR with 96 % missing labels.

MATERIALS AND METHODS

The Cerner Health Facts data are used in the study, with 3749 labeled DR patients and 97,876 unlabeled diabetic patients. This extensive dataset spans the demographics of the United States over the past two decades. We implemented state-of-art positive-unlabeled learning methods, including ensemble-based support vector machine, ensemble-based random forest, and Bayesian finite mixture modeling.

RESULTS

The estimated DR prevalence in the population represented by Cerner EHR is approximately 25 % and the classification techniques generally achieve an AUC of around 87 %. As a by-product, a predictive inference on the risk of DR based on a patient's personalized medical information is derived.

DISCUSSION

Missing labels is a common issue for EHR data quality. Ignoring these missing labels can lead to biased results in the analyses of EHR data. The problem is especially severe in the context of DR. It is thus important to use machine learning or statistical tools to identify the unlabeled patients. The tool in this paper helps both data analysts and clinicians in their practices.

摘要

目的

本文旨在解决电子健康记录（EHR）中大量未标记患者的问题，这些患者可能患有未确诊的糖尿病视网膜病变（DR）。期望在96%标签缺失的情况下估计EHR中DR的实际患病率。

材料与方法

本研究使用了Cerner健康事实数据，其中有3749名标记了DR的患者和97876名未标记的糖尿病患者。这个庞大的数据集涵盖了过去二十年美国的人口统计数据。我们实施了先进的正无标记学习方法，包括基于集成的支持向量机、基于集成的随机森林和贝叶斯有限混合模型。

结果

Cerner EHR所代表的人群中估计的DR患病率约为25%，分类技术通常实现的曲线下面积（AUC）约为87%。作为副产品，基于患者的个性化医疗信息得出了对DR风险的预测推断。

讨论

标签缺失是EHR数据质量的常见问题。在EHR数据分析中忽略这些缺失标签会导致有偏差的结果。在DR的背景下，这个问题尤其严重。因此，使用机器学习或统计工具来识别未标记患者很重要。本文中的工具对数据分析师和临床医生的实践都有帮助。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c6e/11666125/7698d0835881/nihms-2042093-f0001.jpg

相似文献

Estimating the prevalence of diabetic retinopathy in electronic health records with massive missing labels.在存在大量缺失标签的电子健康记录中估计糖尿病视网膜病变的患病率。

Intell Based Med. 2024;10. doi: 10.1016/j.ibmed.2024.100154. Epub 2024 Jul 5.

BAYESIAN ANALYSIS FOR IMBALANCED POSITIVE-UNLABELLED DIAGNOSIS CODES IN ELECTRONIC HEALTH RECORDS.电子健康记录中不平衡阳性-未标记诊断代码的贝叶斯分析

Ann Appl Stat. 2023 Jun;17(2):1220-1238. doi: 10.1214/22-AOAS1666. Epub 2023 May 1.

DRRisk: A Web-based tool to Assess the Risk of Diabetic Retinopathy through Machine Learning on Electronic Health Records.DRRisk：一个基于网络的工具，通过电子健康记录上的机器学习来评估糖尿病视网膜病变的风险。

AMIA Annu Symp Proc. 2023 Apr 29;2022:452-460. eCollection 2022.

Weakly Semi-supervised phenotyping using Electronic Health records.基于电子健康记录的弱监督表型研究

J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.

Detecting diabetic retinopathy through machine learning on electronic health record data from an urban, safety net healthcare system.通过机器学习对来自城市安全网医疗系统的电子健康记录数据进行糖尿病视网膜病变检测。

JAMIA Open. 2021 Aug 19;4(3):ooab066. doi: 10.1093/jamiaopen/ooab066. eCollection 2021 Jul.

Construction of Predictive Model for Type 2 Diabetic Retinopathy Based on Extreme Learning Machine.基于极限学习机的2型糖尿病视网膜病变预测模型构建

Diabetes Metab Syndr Obes. 2022 Aug 24;15:2607-2617. doi: 10.2147/DMSO.S374767. eCollection 2022.

A machine learning-based framework to identify type 2 diabetes through electronic health records.一种基于机器学习的通过电子健康记录识别2型糖尿病的框架。

Int J Med Inform. 2017 Jan;97:120-127. doi: 10.1016/j.ijmedinf.2016.09.014. Epub 2016 Oct 1.

Electronic health record phenotyping improves detection and screening of type 2 diabetes in the general United States population: A cross-sectional, unselected, retrospective study.电子健康记录表型分析改善了美国普通人群中2型糖尿病的检测和筛查：一项横断面、非选择性、回顾性研究。

J Biomed Inform. 2016 Apr;60:162-8. doi: 10.1016/j.jbi.2015.12.006. Epub 2015 Dec 17.

Identifying the severity of diabetic retinopathy by visual function measures using both traditional statistical methods and interpretable machine learning: a cross-sectional study.使用传统统计方法和可解释机器学习识别视觉功能测量的糖尿病视网膜病变严重程度：一项横断面研究。

Diabetologia. 2023 Dec;66(12):2250-2260. doi: 10.1007/s00125-023-06005-3. Epub 2023 Sep 19.

Development of electronic health record based algorithms to identify individuals with diabetic retinopathy.基于电子健康记录的算法开发，以识别患有糖尿病视网膜病变的个体。

J Am Med Inform Assoc. 2024 Nov 1;31(11):2560-2570. doi: 10.1093/jamia/ocae213.

本文引用的文献

BAYESIAN ANALYSIS FOR IMBALANCED POSITIVE-UNLABELLED DIAGNOSIS CODES IN ELECTRONIC HEALTH RECORDS.电子健康记录中不平衡阳性-未标记诊断代码的贝叶斯分析

Ann Appl Stat. 2023 Jun;17(2):1220-1238. doi: 10.1214/22-AOAS1666. Epub 2023 May 1.

Improving Access to Eye Care: A Systematic Review of the Literature.改善眼科保健服务可及性的系统评价文献研究。

Ophthalmology. 2022 Oct;129(10):e114-e126. doi: 10.1016/j.ophtha.2022.07.012. Epub 2022 Sep 1.

JAMIA Open. 2021 Aug 19;4(3):ooab066. doi: 10.1093/jamiaopen/ooab066. eCollection 2021 Jul.

Derivation and Validation of Essential Predictors and Risk Index for Early Detection of Diabetic Retinopathy Using Electronic Health Records.利用电子健康记录推导和验证糖尿病视网膜病变早期检测的基本预测指标和风险指数

J Clin Med. 2021 Apr 2;10(7):1473. doi: 10.3390/jcm10071473.

Applications of Artificial Intelligence to Electronic Health Record Data in Ophthalmology.人工智能在眼科学电子病历数据中的应用。

Transl Vis Sci Technol. 2020 Feb 27;9(2):13. doi: 10.1167/tvst.9.2.13.

A maximum likelihood approach to electronic health record phenotyping using positive and unlabeled patients.一种使用阳性和未标记患者进行电子健康记录表型分析的最大似然方法。

J Am Med Inform Assoc. 2020 Jan 1;27(1):119-126. doi: 10.1093/jamia/ocz170.

Predicting diabetic retinopathy and identifying interpretable biomedical features using machine learning algorithms.利用机器学习算法预测糖尿病视网膜病变并识别可解释的生物医学特征。

BMC Bioinformatics. 2018 Aug 13;19(Suppl 9):283. doi: 10.1186/s12859-018-2277-0.

Diabetic Retinopathy: A Position Statement by the American Diabetes Association.糖尿病视网膜病变：美国糖尿病协会的立场声明。

Diabetes Care. 2017 Mar;40(3):412-418. doi: 10.2337/dc16-2641.

Evaluating Adherence to Dilated Eye Examination Recommendations Among Patients with Diabetes, Combined with Patient and Provider Perspectives.评估糖尿病患者对散瞳眼科检查建议的依从性，并结合患者和医疗服务提供者的观点。

Am Health Drug Benefits. 2016 Oct;9(7):385-393.

Diabetic Retinopathy in Patients with Diabetic Nephropathy: Development and Progression.糖尿病肾病患者的糖尿病视网膜病变：发生与进展

PLoS One. 2016 Aug 26;11(8):e0161897. doi: 10.1371/journal.pone.0161897. eCollection 2016.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。