描述在医疗保健领域的机器学习中使用诊断代码时的局限性。

Characterizing the limitations of using diagnosis codes in the context of machine learning for healthcare.

机构信息

Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, ON, Canada.

Division of Pediatric Hospital Medicine, Department of Pediatrics, Stanford University, Palo Alto, CA, USA.

出版信息

BMC Med Inform Decis Mak. 2024 Feb 14;24(1):51. doi: 10.1186/s12911-024-02449-8.

DOI:10.1186/s12911-024-02449-8

PMID:38355486

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10868117/

Abstract

BACKGROUND

Diagnostic codes are commonly used as inputs for clinical prediction models, to create labels for prediction tasks, and to identify cohorts for multicenter network studies. However, the coverage rates of diagnostic codes and their variability across institutions are underexplored. The primary objective was to describe lab- and diagnosis-based labels for 7 selected outcomes at three institutions. Secondary objectives were to describe agreement, sensitivity, and specificity of diagnosis-based labels against lab-based labels.

METHODS

This study included three cohorts: SickKids from The Hospital for Sick Children, and StanfordPeds and StanfordAdults from Stanford Medicine. We included seven clinical outcomes with lab-based definitions: acute kidney injury, hyperkalemia, hypoglycemia, hyponatremia, anemia, neutropenia and thrombocytopenia. For each outcome, we created four lab-based labels (abnormal, mild, moderate and severe) based on test result and one diagnosis-based label. Proportion of admissions with a positive label were presented for each outcome stratified by cohort. Using lab-based labels as the gold standard, agreement using Cohen's Kappa, sensitivity and specificity were calculated for each lab-based severity level.

RESULTS

The number of admissions included were: SickKids (n = 59,298), StanfordPeds (n = 24,639) and StanfordAdults (n = 159,985). The proportion of admissions with a positive diagnosis-based label was significantly higher for StanfordPeds compared to SickKids across all outcomes, with odds ratio (99.9% confidence interval) for abnormal diagnosis-based label ranging from 2.2 (1.7-2.7) for neutropenia to 18.4 (10.1-33.4) for hyperkalemia. Lab-based labels were more similar by institution. When using lab-based labels as the gold standard, Cohen's Kappa and sensitivity were lower at SickKids for all severity levels compared to StanfordPeds.

CONCLUSIONS

Across multiple outcomes, diagnosis codes were consistently different between the two pediatric institutions. This difference was not explained by differences in test results. These results may have implications for machine learning model development and deployment.

摘要

背景

诊断代码通常被用作临床预测模型的输入，用于创建预测任务的标签，并为多中心网络研究确定队列。然而，诊断代码的覆盖范围及其在各机构之间的差异尚未得到充分探索。主要目的是描述三个机构的 7 种选定结局的基于实验室和诊断的标签。次要目标是描述基于诊断的标签与基于实验室的标签的一致性、敏感性和特异性。

方法

本研究包括三个队列：多伦多 SickKids 医院的 SickKids 队列、斯坦福大学儿科的 StanfordPeds 队列和斯坦福大学成人医学的 StanfordAdults 队列。我们纳入了七种具有实验室定义的临床结局：急性肾损伤、高钾血症、低血糖、低钠血症、贫血、中性粒细胞减少症和血小板减少症。对于每个结局，我们根据检测结果创建了四个基于实验室的标签（异常、轻度、中度和重度）和一个基于诊断的标签。按队列分层，呈现每个结局阳性标签的入院比例。使用基于实验室的标签作为金标准，计算每个基于实验室的严重程度水平的 Cohen's Kappa、敏感性和特异性。

结果

纳入的入院人数为：SickKids（n=59298）、斯坦福儿科（n=24639）和斯坦福成人医学（n=159985）。在所有结局中，斯坦福儿科的基于诊断的阳性标签入院比例明显高于 SickKids，异常诊断的基于诊断的标签比值比（99.9%置信区间）从中性粒细胞减少症的 2.2（1.7-2.7）到高钾血症的 18.4（10.1-33.4）不等。实验室标签在各机构之间更为相似。当使用基于实验室的标签作为金标准时，斯坦福儿科的所有严重程度水平的 Cohen's Kappa 和敏感性均低于 SickKids。

结论

在多个结局中，两个儿科机构的诊断代码始终存在差异。这种差异不能用检测结果的差异来解释。这些结果可能对机器学习模型的开发和部署有影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5666/10868117/1d6910c70f10/12911_2024_2449_Fig1_HTML.jpg

相似文献

Characterizing the limitations of using diagnosis codes in the context of machine learning for healthcare.描述在医疗保健领域的机器学习中使用诊断代码时的局限性。

BMC Med Inform Decis Mak. 2024 Feb 14;24(1):51. doi: 10.1186/s12911-024-02449-8.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Weakly Semi-supervised phenotyping using Electronic Health records.基于电子健康记录的弱监督表型研究

J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.

The prediction of pouch of Douglas obliteration using offline analysis of the transvaginal ultrasound 'sliding sign' technique: inter- and intra-observer reproducibility.经阴道超声“滑动征”技术的离线分析预测道格拉斯窝消失：观察者间和观察者内的可重复性。

Hum Reprod. 2013 May;28(5):1237-46. doi: 10.1093/humrep/det044. Epub 2013 Mar 12.

Automated feature selection of predictors in electronic medical records data.电子病历数据中预测指标的自动特征选择

Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2.

The future of Cochrane Neonatal.考克兰新生儿协作网的未来。

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

[Risk predictive models of healthcare-seeking delay among imported malaria patients in Jiangsu Province based on the machine learning].基于机器学习的江苏省输入性疟疾病例就诊延迟风险预测模型

Zhongguo Xue Xi Chong Bing Fang Zhi Za Zhi. 2023 Jun 28;35(3):225-235. doi: 10.16250/j.32.1374.2022290.

Prediction of hyperkalemia in ESRD patients by identification of multiple leads and multiple features on ECG.通过识别心电图上的多个导联和多个特征预测 ESRD 患者的高钾血症。

Ren Fail. 2023 Dec;45(1):2212800. doi: 10.1080/0886022X.2023.2212800.

Bridging the gap between prostate radiology and pathology through machine learning.通过机器学习弥合前列腺放射学与病理学之间的差距。

Med Phys. 2022 Aug;49(8):5160-5181. doi: 10.1002/mp.15777. Epub 2022 Jun 13.

引用本文的文献

A self-supervised framework for laboratory data imputation in electronic health records.一种用于电子健康记录中实验室数据插补的自监督框架。

Commun Med (Lond). 2025 Jul 1;5(1):251. doi: 10.1038/s43856-025-00973-w.

Detecting and Remediating Harmful Data Shifts for the Responsible Deployment of Clinical AI Models.检测并纠正有害数据偏移，以实现临床人工智能模型的负责任部署。

JAMA Netw Open. 2025 Jun 2;8(6):e2513685. doi: 10.1001/jamanetworkopen.2025.13685.

Feasibility of Machine Learning Analysis for the Identification of Patients with Possible Primary Ciliary Dyskinesia.机器学习分析用于识别可能患有原发性纤毛运动障碍患者的可行性

medRxiv. 2025 Apr 20:2025.04.18.25326065. doi: 10.1101/2025.04.18.25326065.

Distilling the knowledge from large-language model for health event prediction.从大语言模型中提取知识用于健康事件预测。

Sci Rep. 2024 Dec 28;14(1):30675. doi: 10.1038/s41598-024-75331-2.

A multi-center study on the adaptability of a shared foundation model for electronic health records.一项关于电子健康记录共享基础模型适应性的多中心研究。

NPJ Digit Med. 2024 Jun 27;7(1):171. doi: 10.1038/s41746-024-01166-w.

Self-supervised machine learning using adult inpatient data produces effective models for pediatric clinical prediction tasks.利用成人住院数据进行自我监督机器学习，可以为儿科临床预测任务生成有效的模型。

J Am Med Inform Assoc. 2023 Nov 17;30(12):2004-2011. doi: 10.1093/jamia/ocad175.

本文引用的文献

Development and validation of the SickKids Enterprise-wide Data in Azure Repository (SEDAR).病童医院全企业范围的Azure存储库数据（SEDAR）的开发与验证。

Heliyon. 2023 Nov 2;9(11):e21586. doi: 10.1016/j.heliyon.2023.e21586. eCollection 2023 Nov.

Accuracy of , 10th Revision Codes for Identifying Sepsis: A Systematic Review and Meta-Analysis.用于识别脓毒症的国际疾病分类第10版编码的准确性：一项系统评价和Meta分析

Crit Care Explor. 2022 Nov 9;4(11):e0788. doi: 10.1097/CCE.0000000000000788. eCollection 2022 Nov.

Large-scale evidence generation and evaluation across a network of databases for type 2 diabetes mellitus (LEGEND-T2DM): a protocol for a series of multinational, real-world comparative cardiovascular effectiveness and safety studies.大规模证据生成和评估网络数据库中的 2 型糖尿病（LEGEND-T2DM）：一系列跨国真实世界比较心血管有效性和安全性研究的方案。

BMJ Open. 2022 Jun 9;12(6):e057977. doi: 10.1136/bmjopen-2021-057977.

Evaluation of domain generalization and adaptation on improving model robustness to temporal dataset shift in clinical medicine.评估领域泛化和适应对提高临床医学中模型对时间数据集变化的鲁棒性。

Sci Rep. 2022 Feb 17;12(1):2726. doi: 10.1038/s41598-022-06484-1.

Systematic Review of Approaches to Preserve Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine.临床医学中存在时间数据集偏移时保留机器学习性能的方法的系统评价。

Appl Clin Inform. 2021 Aug;12(4):808-815. doi: 10.1055/s-0041-1735184. Epub 2021 Sep 1.

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.医学BERT：基于大规模结构化电子健康记录进行疾病预测的预训练上下文嵌入模型

NPJ Digit Med. 2021 May 20;4(1):86. doi: 10.1038/s41746-021-00455-y.

Language models are an effective representation learning technique for electronic health record data.语言模型是一种用于电子健康记录数据的有效表示学习技术。

J Biomed Inform. 2021 Jan;113:103637. doi: 10.1016/j.jbi.2020.103637. Epub 2020 Dec 5.

An empirical characterization of fair machine learning for clinical risk prediction.用于临床风险预测的公平机器学习的实证特征描述。

J Biomed Inform. 2021 Jan;113:103621. doi: 10.1016/j.jbi.2020.103621. Epub 2020 Nov 18.

Automated Identification of Adults at Risk for In-Hospital Clinical Deterioration.自动化识别住院临床恶化风险成人。

N Engl J Med. 2020 Nov 12;383(20):1951-1960. doi: 10.1056/NEJMsa2001090.

Effect of Integrating Machine Learning Mortality Estimates With Behavioral Nudges to Clinicians on Serious Illness Conversations Among Patients With Cancer: A Stepped-Wedge Cluster Randomized Clinical Trial.将机器学习死亡率估计与行为提示相结合，为临床医生提供指导，以改善癌症患者的严重疾病沟通：一项 stepped-wedge 聚类随机临床试验。

JAMA Oncol. 2020 Dec 1;6(12):e204759. doi: 10.1001/jamaoncol.2020.4759. Epub 2020 Dec 10.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

描述在医疗保健领域的机器学习中使用诊断代码时的局限性。

Characterizing the limitations of using diagnosis codes in the context of machine learning for healthcare.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献