Suppr超能文献

描述在医疗保健领域的机器学习中使用诊断代码时的局限性。

Characterizing the limitations of using diagnosis codes in the context of machine learning for healthcare.

机构信息

Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, ON, Canada.

Division of Pediatric Hospital Medicine, Department of Pediatrics, Stanford University, Palo Alto, CA, USA.

出版信息

BMC Med Inform Decis Mak. 2024 Feb 14;24(1):51. doi: 10.1186/s12911-024-02449-8.

Abstract

BACKGROUND

Diagnostic codes are commonly used as inputs for clinical prediction models, to create labels for prediction tasks, and to identify cohorts for multicenter network studies. However, the coverage rates of diagnostic codes and their variability across institutions are underexplored. The primary objective was to describe lab- and diagnosis-based labels for 7 selected outcomes at three institutions. Secondary objectives were to describe agreement, sensitivity, and specificity of diagnosis-based labels against lab-based labels.

METHODS

This study included three cohorts: SickKids from The Hospital for Sick Children, and StanfordPeds and StanfordAdults from Stanford Medicine. We included seven clinical outcomes with lab-based definitions: acute kidney injury, hyperkalemia, hypoglycemia, hyponatremia, anemia, neutropenia and thrombocytopenia. For each outcome, we created four lab-based labels (abnormal, mild, moderate and severe) based on test result and one diagnosis-based label. Proportion of admissions with a positive label were presented for each outcome stratified by cohort. Using lab-based labels as the gold standard, agreement using Cohen's Kappa, sensitivity and specificity were calculated for each lab-based severity level.

RESULTS

The number of admissions included were: SickKids (n = 59,298), StanfordPeds (n = 24,639) and StanfordAdults (n = 159,985). The proportion of admissions with a positive diagnosis-based label was significantly higher for StanfordPeds compared to SickKids across all outcomes, with odds ratio (99.9% confidence interval) for abnormal diagnosis-based label ranging from 2.2 (1.7-2.7) for neutropenia to 18.4 (10.1-33.4) for hyperkalemia. Lab-based labels were more similar by institution. When using lab-based labels as the gold standard, Cohen's Kappa and sensitivity were lower at SickKids for all severity levels compared to StanfordPeds.

CONCLUSIONS

Across multiple outcomes, diagnosis codes were consistently different between the two pediatric institutions. This difference was not explained by differences in test results. These results may have implications for machine learning model development and deployment.

摘要

背景

诊断代码通常被用作临床预测模型的输入,用于创建预测任务的标签,并为多中心网络研究确定队列。然而,诊断代码的覆盖范围及其在各机构之间的差异尚未得到充分探索。主要目的是描述三个机构的 7 种选定结局的基于实验室和诊断的标签。次要目标是描述基于诊断的标签与基于实验室的标签的一致性、敏感性和特异性。

方法

本研究包括三个队列:多伦多 SickKids 医院的 SickKids 队列、斯坦福大学儿科的 StanfordPeds 队列和斯坦福大学成人医学的 StanfordAdults 队列。我们纳入了七种具有实验室定义的临床结局:急性肾损伤、高钾血症、低血糖、低钠血症、贫血、中性粒细胞减少症和血小板减少症。对于每个结局,我们根据检测结果创建了四个基于实验室的标签(异常、轻度、中度和重度)和一个基于诊断的标签。按队列分层,呈现每个结局阳性标签的入院比例。使用基于实验室的标签作为金标准,计算每个基于实验室的严重程度水平的 Cohen's Kappa、敏感性和特异性。

结果

纳入的入院人数为:SickKids(n=59298)、斯坦福儿科(n=24639)和斯坦福成人医学(n=159985)。在所有结局中,斯坦福儿科的基于诊断的阳性标签入院比例明显高于 SickKids,异常诊断的基于诊断的标签比值比(99.9%置信区间)从中性粒细胞减少症的 2.2(1.7-2.7)到高钾血症的 18.4(10.1-33.4)不等。实验室标签在各机构之间更为相似。当使用基于实验室的标签作为金标准时,斯坦福儿科的所有严重程度水平的 Cohen's Kappa 和敏感性均低于 SickKids。

结论

在多个结局中,两个儿科机构的诊断代码始终存在差异。这种差异不能用检测结果的差异来解释。这些结果可能对机器学习模型的开发和部署有影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5666/10868117/1d6910c70f10/12911_2024_2449_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验