医疗中心间数据碎片化对用于指定 2 型糖尿病患者的高通量临床表型算法准确性的影响。

Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus.

机构信息

Institute for Health Informatics, University of Minnesota, Twin Cities, Minnesota, USA.

出版信息

J Am Med Inform Assoc. 2012 Mar-Apr;19(2):219-24. doi: 10.1136/amiajnl-2011-000597. Epub 2012 Jan 16.

DOI:10.1136/amiajnl-2011-000597

PMID:22249968

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3277630/

Abstract

OBJECTIVE

To evaluate data fragmentation across healthcare centers with regard to the accuracy of a high-throughput clinical phenotyping (HTCP) algorithm developed to differentiate (1) patients with type 2 diabetes mellitus (T2DM) and (2) patients with no diabetes.

MATERIALS AND METHODS

This population-based study identified all Olmsted County, Minnesota residents in 2007. We used provider-linked electronic medical record data from the two healthcare centers that provide >95% of all care to County residents (ie, Olmsted Medical Center and Mayo Clinic in Rochester, Minnesota, USA). Subjects were limited to residents with one or more encounter January 1, 2006 through December 31, 2007 at both healthcare centers. DM-relevant data on diagnoses, laboratory results, and medication from both centers were obtained during this period. The algorithm was first executed using data from both centers (ie, the gold standard) and then from Mayo Clinic alone. Positive predictive values and false-negative rates were calculated, and the McNemar test was used to compare categorization when data from the Mayo Clinic alone were used with the gold standard. Age and sex were compared between true-positive and false-negative subjects with T2DM. Statistical significance was accepted as p<0.05.

RESULTS

With data from both medical centers, 765 subjects with T2DM (4256 non-DM subjects) were identified. When single-center data were used, 252 T2DM subjects (1573 non-DM subjects) were missed; an additional false-positive 27 T2DM subjects (215 non-DM subjects) were identified. The positive predictive values and false-negative rates were 95.0% (513/540) and 32.9% (252/765), respectively, for T2DM subjects and 92.6% (2683/2898) and 37.0% (1573/4256), respectively, for non-DM subjects. Age and sex distribution differed between true-positive (mean age 62.1; 45% female) and false-negative (mean age 65.0; 56.0% female) T2DM subjects.

CONCLUSION

The findings show that application of an HTCP algorithm using data from a single medical center contributes to misclassification. These findings should be considered carefully by researchers when developing and executing HTCP algorithms.

摘要

目的

评估医疗中心在高通量临床表型分析（HTCP）算法准确性方面的数据碎片化情况，该算法旨在区分（1）2 型糖尿病（T2DM）患者和（2）无糖尿病患者。

材料与方法

本基于人群的研究确定了 2007 年明尼苏达州奥姆斯特德县的所有居民。我们使用来自为县居民提供 95%以上医疗服务的两个医疗中心（即明尼苏达州罗切斯特的奥姆斯特德医疗中心和梅奥诊所）的关联电子病历数据。研究对象仅限于 2006 年 1 月 1 日至 2007 年 12 月 31 日期间在两个医疗中心至少有一次就诊的居民。在此期间，从两个中心获取 DM 相关的诊断、实验室结果和药物数据。该算法首先使用两个中心的数据（即金标准）执行，然后仅使用梅奥诊所的数据执行。计算阳性预测值和假阴性率，并使用 McNemar 检验比较仅使用梅奥诊所数据与金标准分类时的分类。用 T2DM 比较真阳性和假阴性患者的年龄和性别。统计学意义为 p<0.05。

结果

使用两个医疗中心的数据，共确定了 765 例 T2DM 患者（4256 例非 DM 患者）。当使用单中心数据时，漏诊了 252 例 T2DM 患者（1573 例非 DM 患者）；另外还误识别了 27 例 T2DM 患者（215 例非 DM 患者）。T2DM 患者的阳性预测值和假阴性率分别为 95.0%（513/540）和 32.9%（252/765），非 DM 患者的阳性预测值和假阴性率分别为 92.6%（2683/2898）和 37.0%（1573/4256）。真阳性（平均年龄 62.1 岁；45%为女性）和假阴性（平均年龄 65.0 岁；56.0%为女性）T2DM 患者的年龄和性别分布不同。

结论

研究结果表明，使用单医疗中心的数据应用 HTCP 算法会导致分类错误。研究人员在开发和执行 HTCP 算法时应仔细考虑这些发现。

相似文献

Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus.

J Am Med Inform Assoc. 2012 Mar-Apr;19(2):219-24. doi: 10.1136/amiajnl-2011-000597. Epub 2012 Jan 16.

The absence of longitudinal data limits the accuracy of high-throughput clinical phenotyping for identifying type 2 diabetes mellitus subjects.

Int J Med Inform. 2013 Apr;82(4):239-47. doi: 10.1016/j.ijmedinf.2012.05.015. Epub 2012 Jul 2.

Impact of Diverse Data Sources on Computational Phenotyping.

Front Genet. 2020 Jun 3;11:556. doi: 10.3389/fgene.2020.00556. eCollection 2020.

Development and validation of algorithms to identify newly diagnosed type 1 and type 2 diabetes in pediatric population using electronic medical records and claims data.

Pharmacoepidemiol Drug Saf. 2019 Feb;28(2):234-243. doi: 10.1002/pds.4728. Epub 2019 Jan 24.

Development of Type 2 Diabetes Mellitus Phenotyping Framework Using Expert Knowledge and Machine Learning Approach.

J Diabetes Sci Technol. 2017 Jul;11(4):791-799. doi: 10.1177/1932296816681584. Epub 2016 Dec 7.

A Systematic Review of Case-Identification Algorithms Based on Italian Healthcare Administrative Databases for Two Relevant Diseases of the Endocrine System: Diabetes Mellitus and Thyroid Disorders.

Epidemiol Prev. 2019 Jul-Aug;43(4 Suppl 2):17-36. doi: 10.19191/EP19.4.S2.P008.089.

Gastrointestinal tract symptoms among persons with diabetes mellitus in the community.

Arch Intern Med. 2000 Oct 9;160(18):2808-16. doi: 10.1001/archinte.160.18.2808.

Identifying type 1 and type 2 diabetic cases using administrative data: a tree-structured model.

J Diabetes Sci Technol. 2011 May 1;5(3):486-93. doi: 10.1177/193229681100500303.

An algorithm to improve diagnostic accuracy in diabetes in computerised problem orientated medical records (POMR) compared with an established algorithm developed in episode orientated records (EOMR).

J Innov Health Inform. 2015 Jun 5;22(2):255-64. doi: 10.14236/jhi.v22i2.79.

Validating an ontology-based algorithm to identify patients with type 2 diabetes mellitus in electronic health records.

Int J Med Inform. 2014 Oct;83(10):768-78. doi: 10.1016/j.ijmedinf.2014.06.002. Epub 2014 Jun 20.

引用本文的文献

Beyond Phecodes: leveraging PheMAP to identify patients lacking diagnosis codes in electronic health records.

J Am Med Inform Assoc. 2025 Jun 1;32(6):1007-1014. doi: 10.1093/jamia/ocaf055.

Evolution of clinical Health Information Exchanges to population health resources: a case study of the Indiana network for patient care.

BMC Med Inform Decis Mak. 2025 Feb 24;25(1):97. doi: 10.1186/s12911-025-02933-9.

Electronic Health Record-Oriented Knowledge Graph System for Collaborative Clinical Decision Support Using Multicenter Fragmented Medical Data: Design and Application Study.

J Med Internet Res. 2024 Jul 5;26:e54263. doi: 10.2196/54263.

Exploring the impact of missingness on racial disparities in predictive performance of a machine learning model for emergency department triage.

JAMIA Open. 2023 Dec 20;6(4):ooad107. doi: 10.1093/jamiaopen/ooad107. eCollection 2023 Dec.

Development of phenotyping algorithms for hypertensive disorders of pregnancy (HDP) and their application in more than 22,000 pregnant women.

Sci Rep. 2024 Mar 15;14(1):6292. doi: 10.1038/s41598-024-55914-9.

Validation of automated data abstraction for SCCM discovery VIRUS COVID-19 registry: practical EHR export pathways (VIRUS-PEEP).

Front Med (Lausanne). 2023 Oct 4;10:1089087. doi: 10.3389/fmed.2023.1089087. eCollection 2023.

A case study in applying artificial intelligence-based named entity recognition to develop an automated ophthalmic disease registry.

Graefes Arch Clin Exp Ophthalmol. 2023 Nov;261(11):3335-3344. doi: 10.1007/s00417-023-06190-2. Epub 2023 Aug 3.

Automated Type 2 Diabetes Case and Control Identification from the MIMIC-IV Database.

AMIA Jt Summits Transl Sci Proc. 2023 Jun 16;2023:602-611. eCollection 2023.

Novel Analysis Methods to Mine Immune-Mediated Phenotypes and Find Genetic Variation Within the Electronic Health Record (Roadmap for Phenotype to Genotype: Immunogenomics).

J Allergy Clin Immunol Pract. 2022 Jul;10(7):1757-1762. doi: 10.1016/j.jaip.2022.04.016. Epub 2022 Apr 26.

Establishing a National Cardiovascular Disease Surveillance System in the United States Using Electronic Health Record Data: Key Strengths and Limitations.

J Am Heart Assoc. 2022 Apr 19;11(8):e024409. doi: 10.1161/JAHA.121.024409. Epub 2022 Apr 12.

本文引用的文献

Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study.

J Am Med Inform Assoc. 2012 Mar-Apr;19(2):212-8. doi: 10.1136/amiajnl-2011-000439. Epub 2011 Nov 19.

Use of a medical records linkage system to enumerate a dynamic population over time: the Rochester epidemiology project.

Am J Epidemiol. 2011 May 1;173(9):1059-68. doi: 10.1093/aje/kwq482. Epub 2011 Mar 23.

A high throughput semantic concept frequency based approach for patient identification: a case study using type 2 diabetes mellitus clinical notes.

AMIA Annu Symp Proc. 2010 Nov 13;2010:857-61.

The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies.

BMC Med Genomics. 2011 Jan 26;4:13. doi: 10.1186/1755-8794-4-13.

The emerging role of electronic medical records in pharmacogenomics.

Clin Pharmacol Ther. 2011 Mar;89(3):379-86. doi: 10.1038/clpt.2010.260. Epub 2011 Jan 19.

Patients treated at multiple acute health care facilities: quantifying information fragmentation.

Arch Intern Med. 2010 Dec 13;170(22):1989-95. doi: 10.1001/archinternmed.2010.439.

Combining free text and structured electronic medical record entries to detect acute respiratory infections.

PLoS One. 2010 Oct 14;5(10):e13377. doi: 10.1371/journal.pone.0013377.

Note on the sampling error of the difference between correlated proportions or percentages.

Psychometrika. 1947 Jun;12(2):153-7. doi: 10.1007/BF02295996.

Electronic medical records for discovery research in rheumatoid arthritis.

Arthritis Care Res (Hoboken). 2010 Aug;62(8):1120-7. doi: 10.1002/acr.20184.

Prevalence of information gaps for seniors transferred from nursing homes to the emergency department.

CJEM. 2009 Sep;11(5):462-71. doi: 10.1017/s1481803500011660.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

医疗中心间数据碎片化对用于指定 2 型糖尿病患者的高通量临床表型算法准确性的影响。

Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus.

机构信息

Institute for Health Informatics, University of Minnesota, Twin Cities, Minnesota, USA.

出版信息

J Am Med Inform Assoc. 2012 Mar-Apr;19(2):219-24. doi: 10.1136/amiajnl-2011-000597. Epub 2012 Jan 16.

DOI:10.1136/amiajnl-2011-000597

PMID:22249968

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3277630/

Abstract

OBJECTIVE

MATERIALS AND METHODS

RESULTS

CONCLUSION

摘要

目的

评估医疗中心在高通量临床表型分析（HTCP）算法准确性方面的数据碎片化情况，该算法旨在区分（1）2 型糖尿病（T2DM）患者和（2）无糖尿病患者。

材料与方法

结果

结论

研究结果表明，使用单医疗中心的数据应用 HTCP 算法会导致分类错误。研究人员在开发和执行 HTCP 算法时应仔细考虑这些发现。

医疗中心间数据碎片化对用于指定 2 型糖尿病患者的高通量临床表型算法准确性的影响。

Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus.

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

CONCLUSION

目的

材料与方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

医疗中心间数据碎片化对用于指定 2 型糖尿病患者的高通量临床表型算法准确性的影响。

Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus.

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

CONCLUSION

目的

材料与方法

结果

结论