缺乏纵向数据限制了高通量临床表型分析用于识别 2 型糖尿病患者的准确性。

The absence of longitudinal data limits the accuracy of high-throughput clinical phenotyping for identifying type 2 diabetes mellitus subjects.

机构信息

Institute for Health Informatics, University of Minnesota, Twin Cities, MN, USA.

出版信息

Int J Med Inform. 2013 Apr;82(4):239-47. doi: 10.1016/j.ijmedinf.2012.05.015. Epub 2012 Jul 2.

DOI:10.1016/j.ijmedinf.2012.05.015

PMID:22762862

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3478423/

Abstract

PURPOSE

To evaluate the impact of insufficient longitudinal data on the accuracy of a high-throughput clinical phenotyping (HTCP) algorithm for identifying (1) patients with type 2 diabetes mellitus (T2DM) and (2) patients with no diabetes.

METHODS

Retrospective study conducted at Mayo Clinic in Rochester, Minnesota. Eligible subjects were Olmsted County residents with ≥1 Mayo Clinic encounter in each of three time periods: (1) 2007, (2) from 1997 through 2006, and (3) before 1997 (N = 54,283). Diabetes relevant electronic medical record (EMR) data about diagnoses, laboratories, and medications were used. We employed the HTCP algorithm to categorize individuals as T2DM cases and non-diabetes controls. Considering the full 11 years (1997-2007) as the gold standard, we compared gold-standard categorizations with those using data for 10 subsequent intervals, ranging from 1998-2007 (10-year data) to 2007 (1-year data). Positive predictive values (PPVs) and false-negative rates (FNRs) were calculated. McNemar tests were used to determine whether categorizations using shorter time periods differed from the gold standard. Statistical significance was defined as P < 0.05.

RESULTS

We identified 2770 T2DM cases and 21,005 controls when the algorithm was applied using 11-year data. Using 2007 data alone, PPVs and FNRs, respectively, were 70% and 25% for case identification and 59% and 67% for control identification. All time frames differed significantly from the gold standard, except for the 10-year period.

CONCLUSIONS

The accuracy of the algorithm reduced remarkably as data were limited to shorter observation periods. This impact should be considered carefully when designing/executing HTCP algorithms.

摘要

目的

评估纵向数据不足对高通量临床表型分析（HTCP）算法识别（1）2 型糖尿病（T2DM）患者和（2）无糖尿病患者的准确性的影响。

方法

这是一项在明尼苏达州罗切斯特市梅奥诊所进行的回顾性研究。合格的研究对象为奥姆斯特德县居民，他们在三个时间段内至少有一次梅奥诊所就诊记录：（1）2007 年，（2）1997 年至 2006 年，以及（3）1997 年之前（N=54283）。使用与糖尿病相关的电子病历（EMR）数据，包括诊断、实验室检查和药物治疗。我们采用 HTCP 算法将个体归类为 T2DM 病例和非糖尿病对照组。考虑到完整的 11 年（1997-2007 年）作为金标准，我们将金标准分类与使用接下来 10 个时间区间的数据（1998-2007 年[10 年数据]至 2007 年[1 年数据]）进行比较。计算阳性预测值（PPV）和假阴性率（FNR）。采用 McNemar 检验比较使用较短时间区间的分类与金标准是否存在差异。统计学显著性定义为 P<0.05。

结果

当使用 11 年数据应用算法时，我们确定了 2770 例 T2DM 病例和 21005 例对照。仅使用 2007 年的数据，病例识别的 PPV 和 FNR 分别为 70%和 25%，对照组识别的 PPV 和 FNR 分别为 59%和 67%。除 10 年时间区间外，所有时间框架均与金标准有显著差异。

结论

随着数据被限制在较短的观察期内，算法的准确性显著降低。在设计/执行 HTCP 算法时，应仔细考虑这一影响。

相似文献

The absence of longitudinal data limits the accuracy of high-throughput clinical phenotyping for identifying type 2 diabetes mellitus subjects.缺乏纵向数据限制了高通量临床表型分析用于识别 2 型糖尿病患者的准确性。

Int J Med Inform. 2013 Apr;82(4):239-47. doi: 10.1016/j.ijmedinf.2012.05.015. Epub 2012 Jul 2.

Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus.医疗中心间数据碎片化对用于指定 2 型糖尿病患者的高通量临床表型算法准确性的影响。

J Am Med Inform Assoc. 2012 Mar-Apr;19(2):219-24. doi: 10.1136/amiajnl-2011-000597. Epub 2012 Jan 16.

Development of Type 2 Diabetes Mellitus Phenotyping Framework Using Expert Knowledge and Machine Learning Approach.利用专家知识和机器学习方法开发2型糖尿病表型分析框架

J Diabetes Sci Technol. 2017 Jul;11(4):791-799. doi: 10.1177/1932296816681584. Epub 2016 Dec 7.

A machine learning-based framework to identify type 2 diabetes through electronic health records.一种基于机器学习的通过电子健康记录识别2型糖尿病的框架。

Int J Med Inform. 2017 Jan;97:120-127. doi: 10.1016/j.ijmedinf.2016.09.014. Epub 2016 Oct 1.

Validating an ontology-based algorithm to identify patients with type 2 diabetes mellitus in electronic health records.验证一种基于本体的算法，以在电子健康记录中识别2型糖尿病患者。

Int J Med Inform. 2014 Oct;83(10):768-78. doi: 10.1016/j.ijmedinf.2014.06.002. Epub 2014 Jun 20.

Diabetes and hypertension in isolated sixth nerve palsy: a population-based study.孤立性动眼神经麻痹中的糖尿病和高血压：一项基于人群的研究。

Ophthalmology. 2005 May;112(5):760-3. doi: 10.1016/j.ophtha.2004.11.057.

Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study.利用多种电子病历系统在全基因组关联研究中识别 2 型糖尿病的遗传风险。

J Am Med Inform Assoc. 2012 Mar-Apr;19(2):212-8. doi: 10.1136/amiajnl-2011-000439. Epub 2011 Nov 19.

Electronic health record phenotyping improves detection and screening of type 2 diabetes in the general United States population: A cross-sectional, unselected, retrospective study.电子健康记录表型分析改善了美国普通人群中2型糖尿病的检测和筛查：一项横断面、非选择性、回顾性研究。

J Biomed Inform. 2016 Apr;60:162-8. doi: 10.1016/j.jbi.2015.12.006. Epub 2015 Dec 17.

A rule-based electronic phenotyping algorithm for detecting clinically relevant cardiovascular disease cases.一种用于检测临床相关心血管疾病病例的基于规则的电子表型分析算法。

BMC Res Notes. 2017 Jul 14;10(1):281. doi: 10.1186/s13104-017-2600-2.

Electronic health record use to classify patients with newly diagnosed versus preexisting type 2 diabetes: infrastructure for comparative effectiveness research and population health management.电子健康记录用于对新诊断与既有 2 型糖尿病患者进行分类：比较效果研究和人群健康管理的基础设施。

Popul Health Manag. 2012 Feb;15(1):3-11. doi: 10.1089/pop.2010.0084. Epub 2011 Aug 30.

引用本文的文献

Beyond Phecodes: leveraging PheMAP to identify patients lacking diagnosis codes in electronic health records.超越疾病编码：利用PheMAP在电子健康记录中识别无诊断编码的患者。

J Am Med Inform Assoc. 2025 Jun 1;32(6):1007-1014. doi: 10.1093/jamia/ocaf055.

Comparative effectiveness of explainable machine learning approaches for extrauterine growth restriction classification in preterm infants using longitudinal data.使用纵向数据的可解释机器学习方法对早产儿宫外生长受限分类的比较有效性

Front Med (Lausanne). 2023 Nov 29;10:1166743. doi: 10.3389/fmed.2023.1166743. eCollection 2023.

A case study in applying artificial intelligence-based named entity recognition to develop an automated ophthalmic disease registry.应用基于人工智能的命名实体识别技术开发自动化眼科疾病登记系统的案例研究。

Graefes Arch Clin Exp Ophthalmol. 2023 Nov;261(11):3335-3344. doi: 10.1007/s00417-023-06190-2. Epub 2023 Aug 3.

Novel Analysis Methods to Mine Immune-Mediated Phenotypes and Find Genetic Variation Within the Electronic Health Record (Roadmap for Phenotype to Genotype: Immunogenomics).用于挖掘免疫介导表型并在电子健康记录中发现基因变异的新型分析方法（从表型到基因型的路线图：免疫基因组学）。

J Allergy Clin Immunol Pract. 2022 Jul;10(7):1757-1762. doi: 10.1016/j.jaip.2022.04.016. Epub 2022 Apr 26.

Establishing a National Cardiovascular Disease Surveillance System in the United States Using Electronic Health Record Data: Key Strengths and Limitations.利用电子健康记录数据在美国建立国家心血管疾病监测系统：主要优势和局限性。

J Am Heart Assoc. 2022 Apr 19;11(8):e024409. doi: 10.1161/JAHA.121.024409. Epub 2022 Apr 12.

Constructing Epidemiologic Cohorts from Electronic Health Record Data.从电子健康记录数据中构建流行病学队列。

Int J Environ Res Public Health. 2021 Dec 14;18(24):13193. doi: 10.3390/ijerph182413193.

Phenotyping coronavirus disease 2019 during a global health pandemic: Lessons learned from the characterization of an early cohort.在全球大流行期间对 2019 年冠状病毒病进行表型分析：从早期队列特征分析中获得的经验教训。

J Biomed Inform. 2021 May;117:103777. doi: 10.1016/j.jbi.2021.103777. Epub 2021 Apr 8.

A retrospective approach to evaluating potential adverse outcomes associated with delay of procedures for cardiovascular and cancer-related diagnoses in the context of COVID-19.回顾性分析 COVID-19 背景下延迟心血管和癌症相关诊断相关程序与潜在不良结局的关系。

J Biomed Inform. 2021 Jan;113:103657. doi: 10.1016/j.jbi.2020.103657. Epub 2020 Dec 10.

A Decision Support System for Diabetes Chronic Care Models Based on General Practitioner Engagement and EHR Data Sharing.基于全科医生参与和电子健康记录数据共享的糖尿病慢性病护理模式决策支持系统

IEEE J Transl Eng Health Med. 2020 Oct 14;8:3000112. doi: 10.1109/JTEHM.2020.3031107. eCollection 2020.

Optimized Identification of Advanced Chronic Kidney Disease and Absence of Kidney Disease by Combining Different Electronic Health Data Resources and by Applying Machine Learning Strategies.通过整合不同电子健康数据资源并应用机器学习策略优化晚期慢性肾脏病及无肾脏疾病的识别

J Clin Med. 2020 Sep 12;9(9):2955. doi: 10.3390/jcm9092955.

本文引用的文献

J Am Med Inform Assoc. 2012 Mar-Apr;19(2):219-24. doi: 10.1136/amiajnl-2011-000597. Epub 2012 Jan 16.

J Am Med Inform Assoc. 2012 Mar-Apr;19(2):212-8. doi: 10.1136/amiajnl-2011-000439. Epub 2011 Nov 19.

Type 2 diabetes and obesity: genomics and the clinic.2 型糖尿病与肥胖症：基因组学与临床。

Hum Genet. 2011 Jul;130(1):41-58. doi: 10.1007/s00439-011-1023-8. Epub 2011 Jun 7.

Using electronic health records to drive discovery in disease genomics.利用电子健康记录推动疾病基因组学的发现。

Nat Rev Genet. 2011 Jun;12(6):417-28. doi: 10.1038/nrg2999. Epub 2011 May 18.

Impact of diabetes on cardiovascular disease risk and all-cause mortality in older men: influence of age at onset, diabetes duration, and established and novel risk factors.糖尿病对老年男性心血管疾病风险及全因死亡率的影响：发病年龄、糖尿病病程以及既定和新型危险因素的作用

Arch Intern Med. 2011 Mar 14;171(5):404-10. doi: 10.1001/archinternmed.2011.2.

A high throughput semantic concept frequency based approach for patient identification: a case study using type 2 diabetes mellitus clinical notes.一种基于高通量语义概念频率的患者识别方法：以2型糖尿病临床记录为例的案例研究。

AMIA Annu Symp Proc. 2010 Nov 13;2010:857-61.

The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies.eMERGE 网络：一个由生物库组成的联盟，与电子病历数据相关联，用于进行基因组研究。

BMC Med Genomics. 2011 Jan 26;4:13. doi: 10.1186/1755-8794-4-13.

The emerging role of electronic medical records in pharmacogenomics.电子病历在药物基因组学中的新兴作用。

Clin Pharmacol Ther. 2011 Mar;89(3):379-86. doi: 10.1038/clpt.2010.260. Epub 2011 Jan 19.

Genomics, type 2 diabetes, and obesity.基因组学、2型糖尿病与肥胖症

N Engl J Med. 2010 Dec 9;363(24):2339-50. doi: 10.1056/NEJMra0906948.

Combining free text and structured electronic medical record entries to detect acute respiratory infections.结合自由文本和结构化电子病历条目来检测急性呼吸道感染。

PLoS One. 2010 Oct 14;5(10):e13377. doi: 10.1371/journal.pone.0013377.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验