多源数据对计算表型分析的影响。

Impact of Diverse Data Sources on Computational Phenotyping.

作者信息

Wang Liwei, Olson Janet E, Bielinski Suzette J, St Sauver Jennifer L, Fu Sunyang, He Huan, Cicek Mine S, Hathcock Matthew A, Cerhan James R, Liu Hongfang

机构信息

Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States.

Division of Epidemiology, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States.

出版信息

Front Genet. 2020 Jun 3;11:556. doi: 10.3389/fgene.2020.00556. eCollection 2020.

DOI:10.3389/fgene.2020.00556

PMID:32582289

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7283539/

Abstract

Electronic health records (EHRs) are widely adopted with a great potential to serve as a rich, integrated source of phenotype information. Computational phenotyping, which extracts phenotypes from EHR data automatically, can accelerate the adoption and utilization of phenotype-driven efforts to advance scientific discovery and improve healthcare delivery. A list of computational phenotyping algorithms has been published but data fragmentation, i.e., incomplete data within one single data source, has been raised as an inherent limitation of computational phenotyping. In this study, we investigated the impact of diverse data sources on two published computational phenotyping algorithms, rheumatoid arthritis (RA) and type 2 diabetes mellitus (T2DM), using Mayo EHRs and Rochester Epidemiology Project (REP) which links medical records from multiple health care systems. Results showed that both RA (less prevalent) and T2DM (more prevalent) case selections were markedly impacted by data fragmentation, with positive predictive value (PPV) of 91.4 and 92.4%, false-negative rate (FNR) of 26.6 and 14% in Mayo data, respectively, PPV of 97.2 and 98.3%, FNR of 5.2 and 3.3% in REP. T2DM controls also contain biases, with PPV of 91.2% and FNR of 1.2% for Mayo. We further elaborated underlying reasons impacting the performance.

摘要

电子健康记录（EHRs）被广泛采用，极有可能成为丰富、综合的表型信息来源。计算表型分析可从EHR数据中自动提取表型，能加速表型驱动的研究工作的采用和利用，以推动科学发现并改善医疗服务。已有一份计算表型分析算法列表发表，但数据碎片化，即单个数据源内的数据不完整，已被视为计算表型分析的一个固有局限。在本研究中，我们使用梅奥EHRs和罗切斯特流行病学项目（REP，该项目将多个医疗系统的病历相链接），调查了不同数据源对两种已发表的计算表型分析算法（类风湿性关节炎（RA）和2型糖尿病（T2DM））的影响。结果显示，RA（患病率较低）和T2DM（患病率较高）的病例选择均受到数据碎片化的显著影响，在梅奥数据中，阳性预测值（PPV）分别为91.4%和92.4%，假阴性率（FNR）分别为26.6%和14%；在REP中，PPV分别为97.2%和98.3%，FNR分别为5.2%和3.3%。T2DM对照组也存在偏差，在梅奥数据中，PPV为91.2%，FNR为1.2%。我们进一步阐述了影响性能的潜在原因。

相似文献

Impact of Diverse Data Sources on Computational Phenotyping.多源数据对计算表型分析的影响。

Front Genet. 2020 Jun 3;11:556. doi: 10.3389/fgene.2020.00556. eCollection 2020.

Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus.医疗中心间数据碎片化对用于指定 2 型糖尿病患者的高通量临床表型算法准确性的影响。

J Am Med Inform Assoc. 2012 Mar-Apr;19(2):219-24. doi: 10.1136/amiajnl-2011-000597. Epub 2012 Jan 16.

Inherent Bias in Electronic Health Records: A Scoping Review of Sources of Bias.电子健康记录中的固有偏差：偏差来源的范围综述

medRxiv. 2024 Apr 12:2024.04.09.24305594. doi: 10.1101/2024.04.09.24305594.

High-throughput phenotyping with temporal sequences.高通量表型分析与时间序列。

J Am Med Inform Assoc. 2021 Mar 18;28(4):772-781. doi: 10.1093/jamia/ocaa288.

PheMap: a multi-resource knowledge base for high-throughput phenotyping within electronic health records.PheMap：一个用于电子健康记录中高通量表型分析的多资源知识库。

J Am Med Inform Assoc. 2020 Nov 1;27(11):1675-1687. doi: 10.1093/jamia/ocaa104.

Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance.结合电子健康记录中的计费代码、临床记录和药物信息可提供卓越的表型分析性能。

J Am Med Inform Assoc. 2016 Apr;23(e1):e20-7. doi: 10.1093/jamia/ocv130. Epub 2015 Sep 2.

Development of Type 2 Diabetes Mellitus Phenotyping Framework Using Expert Knowledge and Machine Learning Approach.利用专家知识和机器学习方法开发2型糖尿病表型分析框架

J Diabetes Sci Technol. 2017 Jul;11(4):791-799. doi: 10.1177/1932296816681584. Epub 2016 Dec 7.

A collaborative approach to developing an electronic health record phenotyping algorithm for drug-induced liver injury.协作开发药物性肝损伤电子病历表型算法

J Am Med Inform Assoc. 2013 Dec;20(e2):e243-52. doi: 10.1136/amiajnl-2013-001930. Epub 2013 Jul 9.

Deep Phenotyping of Chinese Electronic Health Records by Recognizing Linguistic Patterns of Phenotypic Narratives With a Sequence Motif Discovery Tool: Algorithm Development and Validation.利用序列基序发现工具识别表型叙述的语言模式对中国电子健康记录进行深度表型分析：算法开发与验证

J Med Internet Res. 2022 Jun 3;24(6):e37213. doi: 10.2196/37213.

Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries.基于 FHIR 的电子健康记录表型框架的开发：以从出院小结中识别肥胖且伴有多种合并症的患者为例。

J Biomed Inform. 2019 Nov;99:103310. doi: 10.1016/j.jbi.2019.103310. Epub 2019 Oct 14.

引用本文的文献

Optimal Surrogate-Assisted Sampling for Cost-Efficient Validation of Electronic Health Record Outcomes.用于电子健康记录结果成本效益验证的最优代理辅助抽样

Stat Med. 2025 May;44(10-12):e70095. doi: 10.1002/sim.70095.

Evolution of clinical Health Information Exchanges to population health resources: a case study of the Indiana network for patient care.临床健康信息交换向人群健康资源的演变：以印第安纳州患者护理网络为例

BMC Med Inform Decis Mak. 2025 Feb 24;25(1):97. doi: 10.1186/s12911-025-02933-9.

Automated Type 2 Diabetes Case and Control Identification from the MIMIC-IV Database.从MIMIC-IV数据库中自动识别2型糖尿病病例与对照

AMIA Jt Summits Transl Sci Proc. 2023 Jun 16;2023:602-611. eCollection 2023.

Clin Transl Sci. 2023 Mar;16(3):398-411. doi: 10.1111/cts.13463. Epub 2022 Dec 26.

Establishing an expert consensus for the operational definitions of asthma-associated infectious and inflammatory multimorbidities for computational algorithms through a modified Delphi technique.通过改良的德尔菲技术，为计算算法中哮喘相关感染性和炎症性共病的操作定义建立专家共识。

BMC Med Inform Decis Mak. 2021 Nov 8;21(1):310. doi: 10.1186/s12911-021-01663-y.

本文引用的文献

Defining Phenotypes from Clinical Data to Drive Genomic Research.从临床数据定义表型以推动基因组研究。

Annu Rev Biomed Data Sci. 2018 Jul;1:69-92. doi: 10.1146/annurev-biodatasci-080917-013335. Epub 2018 Apr 25.

Adapting electronic health records-derived phenotypes to claims data: Lessons learned in using limited clinical data for phenotyping.使源自电子健康记录的表型适应理赔数据：利用有限临床数据进行表型分析的经验教训。

J Biomed Inform. 2020 Feb;102:103363. doi: 10.1016/j.jbi.2019.103363. Epub 2019 Dec 19.

Prevalence of Diagnosed Diabetes in Adults by Diabetes Type - United States, 2016.按糖尿病类型划分的美国成年人确诊糖尿病患病率，2016年

MMWR Morb Mortal Wkly Rep. 2018 Mar 30;67(12):359-361. doi: 10.15585/mmwr.mm6712a2.

Data Resource Profile: Expansion of the Rochester Epidemiology Project medical records-linkage system (E-REP).数据资源简介：罗切斯特流行病学项目医疗记录链接系统（E-REP）的扩展

Int J Epidemiol. 2018 Apr 1;47(2):368-368j. doi: 10.1093/ije/dyx268.

Prevalence of rheumatoid arthritis in the United States adult population in healthcare claims databases, 2004-2014.2004-2014 年美国医疗保健索赔数据库中成年人群类风湿关节炎的流行率。

Rheumatol Int. 2017 Sep;37(9):1551-1557. doi: 10.1007/s00296-017-3726-1. Epub 2017 Apr 28.

Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals.评估电子健康记录数据源及识别高血压个体的算法方法。

J Am Med Inform Assoc. 2017 Jan;24(1):162-171. doi: 10.1093/jamia/ocw071. Epub 2016 Aug 7.

ICD-10: History and Context.国际疾病分类第十版：历史与背景。

AJNR Am J Neuroradiol. 2016 Apr;37(4):596-9. doi: 10.3174/ajnr.A4696. Epub 2016 Jan 28.

Extracting research-quality phenotypes from electronic health records to support precision medicine.从电子健康记录中提取研究质量的表型，以支持精准医学。

Genome Med. 2015 Apr 30;7(1):41. doi: 10.1186/s13073-015-0166-y. eCollection 2015.

Intelligent use and clinical benefits of electronic health records in rheumatoid arthritis.类风湿关节炎中电子健康记录的智能应用及临床益处

Expert Rev Clin Immunol. 2015 Mar;11(3):329-37. doi: 10.1586/1744666X.2015.1009895. Epub 2015 Feb 8.

eMERGEing progress in genomics-the first seven years.基因组学的新兴进展——前七年。

Front Genet. 2014 Jun 17;5:184. doi: 10.3389/fgene.2014.00184. eCollection 2014.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

多源数据对计算表型分析的影响。

Impact of Diverse Data Sources on Computational Phenotyping.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献