自然语言处理阿片类药物滥用分类器的偏差和公平性评估：检测和减轻电子健康记录数据在不同种族亚组中的劣势。

Bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups.

机构信息

Department of Psychiatry & Behavioral Sciences, Rush University Medical Center, Chicago, Illinois, USA.

Department of Computer Science, Loyola University, Chicago, Illinois, USA.

出版信息

J Am Med Inform Assoc. 2021 Oct 12;28(11):2393-2403. doi: 10.1093/jamia/ocab148.

DOI:10.1093/jamia/ocab148

PMID:34383925

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8510285/

Abstract

OBJECTIVES

To assess fairness and bias of a previously validated machine learning opioid misuse classifier.

MATERIALS & METHODS: Two experiments were conducted with the classifier's original (n = 1000) and external validation (n = 53 974) datasets from 2 health systems. Bias was assessed via testing for differences in type II error rates across racial/ethnic subgroups (Black, Hispanic/Latinx, White, Other) using bootstrapped 95% confidence intervals. A local surrogate model was estimated to interpret the classifier's predictions by race and averaged globally from the datasets. Subgroup analyses and post-hoc recalibrations were conducted to attempt to mitigate biased metrics.

RESULTS

We identified bias in the false negative rate (FNR = 0.32) of the Black subgroup compared to the FNR (0.17) of the White subgroup. Top features included "heroin" and "substance abuse" across subgroups. Post-hoc recalibrations eliminated bias in FNR with minimal changes in other subgroup error metrics. The Black FNR subgroup had higher risk scores for readmission and mortality than the White FNR subgroup, and a higher mortality risk score than the Black true positive subgroup (P < .05).

DISCUSSION

The Black FNR subgroup had the greatest severity of disease and risk for poor outcomes. Similar features were present between subgroups for predicting opioid misuse, but inequities were present. Post-hoc mitigation techniques mitigated bias in type II error rate without creating substantial type I error rates. From model design through deployment, bias and data disadvantages should be systematically addressed.

CONCLUSION

Standardized, transparent bias assessments are needed to improve trustworthiness in clinical machine learning models.

摘要

目的

评估先前验证的机器学习阿片类药物滥用分类器的公平性和偏差。

材料与方法

使用来自两个健康系统的分类器原始（n=1000）和外部验证（n=53974）数据集进行了两项实验。通过在种族/族裔亚组（黑人、西班牙裔/拉丁裔、白人、其他）中测试二类错误率的差异，使用 bootstrap 95%置信区间评估偏差。估计了一个局部替代模型来解释分类器的预测结果，并从数据集中进行了全局平均。进行了亚组分析和事后重新校准，以尝试减轻有偏差的指标。

结果

我们发现黑人亚组的假阴性率（FNR=0.32）与白人亚组的 FNR（0.17）存在偏差。主要特征包括各亚组中的“海洛因”和“药物滥用”。事后重新校准消除了 FNR 中的偏差，而其他亚组错误指标的变化很小。与白人 FNR 亚组相比，黑人 FNR 亚组的再入院和死亡率风险评分更高，死亡率风险评分也高于黑人真阳性亚组（P<0.05）。

讨论

黑人 FNR 亚组的疾病严重程度和不良预后风险最高。在预测阿片类药物滥用方面，各亚组之间存在相似的特征，但存在不平等现象。事后缓解技术在不产生大量一类错误率的情况下减轻了二类错误率的偏差。从模型设计到部署，应系统地解决偏差和数据劣势问题。

结论

需要进行标准化、透明的偏差评估，以提高临床机器学习模型的可信度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/86d9/8510285/ebfb27808603/ocab148f1.jpg

相似文献

Bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups.

J Am Med Inform Assoc. 2021 Oct 12;28(11):2393-2403. doi: 10.1093/jamia/ocab148.

Fairness in Predicting Cancer Mortality Across Racial Subgroups.

JAMA Netw Open. 2024 Jul 1;7(7):e2421290. doi: 10.1001/jamanetworkopen.2024.21290.

Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients.

BMC Med Inform Decis Mak. 2020 Apr 29;20(1):79. doi: 10.1186/s12911-020-1099-y.

Assessing fairness in machine learning models: A study of racial bias using matched counterparts in mortality prediction for patients with chronic diseases.

J Biomed Inform. 2024 Aug;156:104677. doi: 10.1016/j.jbi.2024.104677. Epub 2024 Jun 13.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

External validation of an opioid misuse machine learning classifier in hospitalized adult patients.

Addict Sci Clin Pract. 2021 Mar 17;16(1):19. doi: 10.1186/s13722-021-00229-7.

Natural language processing and machine learning to identify alcohol misuse from the electronic health record in trauma patients: development and internal validation.

J Am Med Inform Assoc. 2019 Mar 1;26(3):254-261. doi: 10.1093/jamia/ocy166.

Evaluating Algorithmic Bias in 30-Day Hospital Readmission Models: Retrospective Analysis.

J Med Internet Res. 2024 Apr 18;26:e47125. doi: 10.2196/47125.

Development and multimodal validation of a substance misuse algorithm for referral to treatment using artificial intelligence (SMART-AI): a retrospective deep learning study.

Lancet Digit Health. 2022 Jun;4(6):e426-e435. doi: 10.1016/S2589-7500(22)00041-3.

Racial and Ethnic Bias in Risk Prediction Models for Colorectal Cancer Recurrence When Race and Ethnicity Are Omitted as Predictors.

JAMA Netw Open. 2023 Jun 1;6(6):e2318495. doi: 10.1001/jamanetworkopen.2023.18495.

引用本文的文献

A community-based approach to ethical decision-making in artificial intelligence for health care.

JAMIA Open. 2025 Aug 7;8(4):ooaf076. doi: 10.1093/jamiaopen/ooaf076. eCollection 2025 Aug.

Clinical Algorithms and the Legacy of Race-Based Correction: Historical Errors, Contemporary Revisions and Equity-Oriented Methodologies for Epidemiologists.

Clin Epidemiol. 2025 Jul 12;17:647-662. doi: 10.2147/CLEP.S527000. eCollection 2025.

Integrating equity, diversity, and inclusion throughout the lifecycle of artificial intelligence for healthcare: a scoping review.

PLOS Digit Health. 2025 Jul 14;4(7):e0000941. doi: 10.1371/journal.pdig.0000941. eCollection 2025 Jul.

A scoping review and evidence gap analysis of clinical AI fairness.

NPJ Digit Med. 2025 Jun 14;8(1):360. doi: 10.1038/s41746-025-01667-2.

Disparate Model Performance and Stability in Machine Learning Clinical Support for Diabetes and Heart Diseases.

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:95-104. eCollection 2025.

Identifying and mitigating algorithmic bias in the safety net.

NPJ Digit Med. 2025 Jun 5;8(1):335. doi: 10.1038/s41746-025-01732-w.

Clinical implementation of AI-based screening for risk for opioid use disorder in hospitalized adults.

Nat Med. 2025 Apr 3. doi: 10.1038/s41591-025-03603-z.

Improving Clinical Documentation with Artificial Intelligence: A Systematic Review.

Perspect Health Inf Manag. 2024 Jun 1;21(2):1d. eCollection 2024 Summer-Fall.

Misguided Artificial Intelligence: How Racial Bias is Built Into Clinical Models.

Brown J Hosp Med. 2022 Sep 5;2(1):38021. doi: 10.56305/001c.38021. eCollection 2023.

Agenda setting for health equity assessment through the lenses of social determinants of health using machine learning approach: a framework and preliminary pilot study.

BioData Min. 2025 Feb 10;18(1):14. doi: 10.1186/s13040-025-00428-x.

本文引用的文献

Improving Fairness in AI Models on Electronic Health Records: The Case for Federated Learning Methods.

FAccT 23 (2023). 2023 Jun;2023:1599-1608. doi: 10.1145/3593013.3594102. Epub 2023 Jun 12.

Comparison of Methods to Reduce Bias From Clinical Prediction Models of Postpartum Depression.

JAMA Netw Open. 2021 Apr 1;4(4):e213909. doi: 10.1001/jamanetworkopen.2021.3909.

External validation of an opioid misuse machine learning classifier in hospitalized adult patients.

Addict Sci Clin Pract. 2021 Mar 17;16(1):19. doi: 10.1186/s13722-021-00229-7.

Artificial Intelligence, Intersectionality, and the Future of Public Health.

Am J Public Health. 2021 Jan;111(1):98-100. doi: 10.2105/AJPH.2020.306006.

Structural Disparities in Data Science: A Prolegomenon for the Future of Machine Learning.

Am J Bioeth. 2020 Nov;20(11):35-37. doi: 10.1080/15265161.2020.1820102.

Deep transfer learning for reducing health care disparities arising from biomedical data inequality.

Nat Commun. 2020 Oct 12;11(1):5131. doi: 10.1038/s41467-020-18918-3.

Differences in length of stay and discharge destination among patients with substance use disorders: The effect of Substance Use Intervention Team (SUIT) consultation service.

PLoS One. 2020 Oct 9;15(10):e0239761. doi: 10.1371/journal.pone.0239761. eCollection 2020.

A longitudinal analysis of nondaily smokers: the Hispanic Community Health Study/Study of Latinos (HCHS/SOL).

Ann Epidemiol. 2020 Sep;49:61-67. doi: 10.1016/j.annepidem.2020.06.007. Epub 2020 Jun 23.

Vital Signs: Characteristics of Drug Overdose Deaths Involving Opioids and Stimulants - 24 States and the District of Columbia, January-June 2019.

MMWR Morb Mortal Wkly Rep. 2020 Sep 4;69(35):1189-1197. doi: 10.15585/mmwr.mm6935a1.

Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients.

BMC Med Inform Decis Mak. 2020 Apr 29;20(1):79. doi: 10.1186/s12911-020-1099-y.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

自然语言处理阿片类药物滥用分类器的偏差和公平性评估：检测和减轻电子健康记录数据在不同种族亚组中的劣势。

Bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups.

机构信息

Department of Psychiatry & Behavioral Sciences, Rush University Medical Center, Chicago, Illinois, USA.

Department of Computer Science, Loyola University, Chicago, Illinois, USA.