评估并减少人工智能模型中的亚组差异：对儿童新冠病毒检测结果的分析

Evaluating and Reducing Subgroup Disparity in AI Models: An Analysis of Pediatric COVID-19 Test Outcomes.

作者信息

Libin Alexander, Treitler Jonah T, Vasaitis Tadas, Shao Yijun

机构信息

AIM AHEAD Consortium, Georgetown-Howard Universities Center for Clinical and Translational Science (GHUCCTS), Medstar Research Health Institute, Georgetown University, Washington, D.C., USA.

Thomas Jefferson High School for Science and Technology, Alexandria, Virginia, USA.

出版信息

medRxiv. 2024 Sep 19:2024.09.18.24313889. doi: 10.1101/2024.09.18.24313889.

DOI:10.1101/2024.09.18.24313889

PMID:39371141

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11451670/

Abstract

Artificial Intelligence (AI) fairness in healthcare settings has attracted significant attention due to the concerns to propagate existing health disparities. Despite ongoing research, the frequency and extent of subgroup fairness have not been sufficiently studied. In this study, we extracted a nationally representative pediatric dataset (ages 0-17, n=9,935) from the US National Health Interview Survey (NHIS) concerning COVID-19 test outcomes. For subgroup disparity assessment, we trained 50 models using five machine learning algorithms. We assessed the models' area under the curve (AUC) on 12 small (<15% of the total n) subgroups defined using social economic factors versus the on the overall population. Our results show that subgroup disparities were prevalent (50.7%) in the models. Subgroup AUCs were generally lower, with a mean difference of 0.01, ranging from -0.29 to +0.41. Notably, the disparities were not always statistically significant, with four out of 12 subgroups having statistically significant disparities across models. Additionally, we explored the efficacy of synthetic data in mitigating identified disparities. The introduction of synthetic data enhanced subgroup disparity in 57.7% of the models. The mean AUC disparities for models with synthetic data decreased on average by 0.03 via resampling and 0.04 via generative adverbial network methods.

摘要

由于担心加剧现有的健康差距，人工智能（AI）在医疗保健环境中的公平性受到了广泛关注。尽管研究仍在进行，但亚组公平性的频率和程度尚未得到充分研究。在本研究中，我们从美国国家健康访谈调查（NHIS）中提取了一个具有全国代表性的儿科数据集（0至17岁，n = 9,935），该数据集涉及新冠病毒检测结果。为了评估亚组差异，我们使用五种机器学习算法训练了50个模型。我们评估了这些模型在根据社会经济因素定义的12个小亚组（占总数的<15%）以及总体人群上的曲线下面积（AUC）。我们的结果表明，亚组差异在模型中普遍存在（50.7%）。亚组的AUC通常较低，平均差异为0.01，范围从-0.29到+0.41。值得注意的是，这些差异并不总是具有统计学意义，12个亚组中有4个在各个模型中具有统计学显著差异。此外，我们还探讨了合成数据在减轻已识别差异方面的效果。合成数据的引入在57.7%的模型中加剧了亚组差异。通过重采样，带有合成数据的模型的平均AUC差异平均降低了0.03，通过生成对抗网络方法降低了0.04。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

评估并减少人工智能模型中的亚组差异：对儿童新冠病毒检测结果的分析

Evaluating and Reducing Subgroup Disparity in AI Models: An Analysis of Pediatric COVID-19 Test Outcomes.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

评估并减少人工智能模型中的亚组差异：对儿童新冠病毒检测结果的分析

Evaluating and Reducing Subgroup Disparity in AI Models: An Analysis of Pediatric COVID-19 Test Outcomes.

作者信息

机构信息

出版信息

相似文献

本文引用的文献