Precision Population Science Lab, Mayo Clinic, Rochester, Minnesota, USA.
Artificial Intelligence Program of Department of Pediatric and Adolescent Medicine, Mayo Clinic, Rochester, Minnesota, USA.
J Am Med Inform Assoc. 2022 Jun 14;29(7):1142-1151. doi: 10.1093/jamia/ocac052.
Artificial intelligence (AI) models may propagate harmful biases in performance and hence negatively affect the underserved. We aimed to assess the degree to which data quality of electronic health records (EHRs) affected by inequities related to low socioeconomic status (SES), results in differential performance of AI models across SES.
This study utilized existing machine learning models for predicting asthma exacerbation in children with asthma. We compared balanced error rate (BER) against different SES levels measured by HOUsing-based SocioEconomic Status measure (HOUSES) index. As a possible mechanism for differential performance, we also compared incompleteness of EHR information relevant to asthma care by SES.
Asthmatic children with lower SES had larger BER than those with higher SES (eg, ratio = 1.35 for HOUSES Q1 vs Q2-Q4) and had a higher proportion of missing information relevant to asthma care (eg, 41% vs 24% for missing asthma severity and 12% vs 9.8% for undiagnosed asthma despite meeting asthma criteria).
Our study suggests that lower SES is associated with worse predictive model performance. It also highlights the potential role of incomplete EHR data in this differential performance and suggests a way to mitigate this bias.
The HOUSES index allows AI researchers to assess bias in predictive model performance by SES. Although our case study was based on a small sample size and a single-site study, the study results highlight a potential strategy for identifying bias by using an innovative SES measure.
人工智能 (AI) 模型可能会传播有害的性能偏差,从而对服务不足的人群产生负面影响。我们旨在评估与低社会经济地位 (SES) 相关的电子健康记录 (EHR) 数据质量对 AI 模型在 SES 中的性能差异的影响程度。
本研究利用现有的机器学习模型来预测哮喘儿童的哮喘恶化情况。我们将平衡错误率 (BER) 与使用基于家庭的社会经济地位衡量标准 (HOUSES) 指数衡量的不同 SES 水平进行了比较。作为性能差异的一种可能机制,我们还比较了 SES 相关的 EHR 信息在哮喘护理方面的完整性。
SES 较低的哮喘儿童的 BER 高于 SES 较高的儿童(例如,HOUSES Q1 与 Q2-Q4 的比值为 1.35),并且与哮喘护理相关的信息缺失比例较高(例如,缺失哮喘严重程度的比例为 41%,而未诊断哮喘的比例为 24%,尽管符合哮喘标准)。
我们的研究表明,SES 较低与预测模型性能较差有关。它还强调了 EHR 数据不完整在这种差异性能中的潜在作用,并提出了一种减轻这种偏差的方法。
HOUSES 指数允许 AI 研究人员通过 SES 评估预测模型性能的偏差。尽管我们的案例研究基于小样本量和单站点研究,但研究结果突出了一种通过使用创新的 SES 衡量标准来识别偏差的潜在策略。