针对代表性不足患者的亚群特异性机器学习预后分析及双重优先偏差校正

Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction.

作者信息

Afrose Sharmin, Song Wenjia, Nemeroff Charles B, Lu Chang, Yao Danfeng Daphne

机构信息

Department of Computer Science, Virginia Tech, Blacksburg, VA USA.

Department of Psychiatry and Behavioral Sciences, The University of Texas at Austin Dell Medical School, Austin, TX USA.

出版信息

Commun Med (Lond). 2022 Sep 1;2:111. doi: 10.1038/s43856-022-00165-w. eCollection 2022.

DOI:10.1038/s43856-022-00165-w

PMID:36059892

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9436942/

Abstract

BACKGROUND

Many clinical datasets are intrinsically imbalanced, dominated by overwhelming majority groups. Off-the-shelf machine learning models that optimize the prognosis of majority patient types (e.g., healthy class) may cause substantial errors on the minority prediction class (e.g., disease class) and demographic subgroups (e.g., Black or young patients). In the typical one-machine-learning-model-fits-all paradigm, racial and age disparities are likely to exist, but unreported. In addition, some widely used whole-population metrics give misleading results.

METHODS

We design a double prioritized (DP) bias correction technique to mitigate representational biases in machine learning-based prognosis. Our method trains customized machine learning models for specific ethnicity or age groups, a substantial departure from the one-model-predicts-all convention. We compare with other sampling and reweighting techniques in mortality and cancer survivability prediction tasks.

RESULTS

We first provide empirical evidence showing various prediction deficiencies in a typical machine learning setting without bias correction. For example, missed death cases are 3.14 times higher than missed survival cases for mortality prediction. Then, we show DP consistently boosts the minority class recall for underrepresented groups, by up to 38.0%. DP also reduces relative disparities across race and age groups, e.g., up to 88.0% better than the 8 existing sampling solutions in terms of the relative disparity of minority class recall. Cross-race and cross-age-group evaluation also suggests the need for subpopulation-specific machine learning models.

CONCLUSIONS

Biases exist in the widely accepted one-machine-learning-model-fits-all-population approach. We invent a bias correction method that produces specialized machine learning prognostication models for underrepresented racial and age groups. This technique may reduce potentially life-threatening prediction mistakes for minority populations.

摘要

背景

许多临床数据集本质上是不平衡的，由绝大多数群体主导。优化大多数患者类型（如健康类别）预后的现成机器学习模型可能在少数预测类别（如疾病类别）和人口亚组（如黑人或年轻患者）上导致重大错误。在典型的一个机器学习模型适用于所有情况的范式中，种族和年龄差异可能存在，但未被报告。此外，一些广泛使用的全人群指标会给出误导性结果。

方法

我们设计了一种双重优先（DP）偏差校正技术，以减轻基于机器学习的预后中的代表性偏差。我们的方法针对特定种族或年龄组训练定制的机器学习模型，这与一个模型预测所有情况的传统方法有很大不同。我们在死亡率和癌症生存率预测任务中与其他采样和重新加权技术进行比较。

结果

我们首先提供了经验证据，表明在没有偏差校正的典型机器学习设置中存在各种预测缺陷。例如，在死亡率预测中，漏报的死亡病例比漏报的存活病例高3.14倍。然后，我们表明DP持续提高了代表性不足群体的少数类召回率，最高可达38.0%。DP还减少了种族和年龄组之间的相对差异，例如，在少数类召回率的相对差异方面，比现有的8种采样解决方案高出88.ness="50%">

结论

广泛接受的一个机器学习模型适用于所有人群的方法存在偏差。我们发明了一种偏差校正方法，为代表性不足的种族和年龄组生成专门的机器学习预后模型。这项技术可能会减少对少数群体潜在的危及生命的预测错误。

相似文献

Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction.

Commun Med (Lond). 2022 Sep 1;2:111. doi: 10.1038/s43856-022-00165-w. eCollection 2022.

Machine Learning Strategies for Improved Phenotype Prediction in Underrepresented Populations.

bioRxiv. 2023 Oct 17:2023.10.12.561949. doi: 10.1101/2023.10.12.561949.

Machine Learning Strategies for Improved Phenotype Prediction in Underrepresented Populations.

Pac Symp Biocomput. 2024;29:404-418.

Effect of machine learning re-sampling techniques for imbalanced datasets in F-FDG PET-based radiomics model on prognostication performance in cohorts of head and neck cancer patients.

Eur J Nucl Med Mol Imaging. 2020 Nov;47(12):2826-2835. doi: 10.1007/s00259-020-04756-4. Epub 2020 Apr 6.

Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercIse testing (FIT) project.

BMC Med Inform Decis Mak. 2017 Dec 19;17(1):174. doi: 10.1186/s12911-017-0566-6.

A Racially Unbiased, Machine Learning Approach to Prediction of Mortality: Algorithm Development Study.

JMIR Public Health Surveill. 2020 Oct 22;6(4):e22400. doi: 10.2196/22400.

Stroke Prediction with Machine Learning Methods among Older Chinese.

Int J Environ Res Public Health. 2020 Mar 12;17(6):1828. doi: 10.3390/ijerph17061828.

Trans-Balance: Reducing demographic disparity for prediction models in the presence of class imbalance.

J Biomed Inform. 2024 Jan;149:104532. doi: 10.1016/j.jbi.2023.104532. Epub 2023 Dec 7.

Comparison of Methods to Reduce Bias From Clinical Prediction Models of Postpartum Depression.

JAMA Netw Open. 2021 Apr 1;4(4):e213909. doi: 10.1001/jamanetworkopen.2021.3909.

Evaluation and Mitigation of Racial Bias in Clinical Machine Learning Models: Scoping Review.

JMIR Med Inform. 2022 May 31;10(5):e36388. doi: 10.2196/36388.

引用本文的文献

A scoping review and evidence gap analysis of clinical AI fairness.

NPJ Digit Med. 2025 Jun 14;8(1):360. doi: 10.1038/s41746-025-01667-2.

Failure modes and mitigations for Bayesian optimization of neuromodulation parameters.

J Neural Eng. 2025 Jun 13;22(3):036038. doi: 10.1088/1741-2552/ade189.

The illusion of safety: A report to the FDA on AI healthcare product approvals.

PLOS Digit Health. 2025 Jun 5;4(6):e0000866. doi: 10.1371/journal.pdig.0000866. eCollection 2025 Jun.

Predicting therapeutic clinical trial enrollment for adult patients with low- and high-grade glioma using supervised machine learning.

Sci Adv. 2025 Jun 6;11(23):eadt5708. doi: 10.1126/sciadv.adt5708. Epub 2025 Jun 4.

Status and opportunities of machine learning applications in obstructive sleep apnea: A narrative review.

Comput Struct Biotechnol J. 2025 Apr 25;28:167-174. doi: 10.1016/j.csbj.2025.04.033. eCollection 2025.

Low responsiveness of machine learning models to critical or deteriorating health conditions.

Commun Med (Lond). 2025 Mar 11;5(1):62. doi: 10.1038/s43856-025-00775-0.

Status and Opportunities of Machine Learning Applications in Obstructive Sleep Apnea: A Narrative Review.

medRxiv. 2025 May 10:2025.02.27.25322950. doi: 10.1101/2025.02.27.25322950.

Examining inclusivity: the use of AI and diverse populations in health and social care: a systematic review.

BMC Med Inform Decis Mak. 2025 Feb 5;25(1):57. doi: 10.1186/s12911-025-02884-1.

Application of machine learning techniques for warfarin dosage prediction: a case study on the MIMIC-III dataset.

PeerJ Comput Sci. 2025 Jan 2;11:e2612. doi: 10.7717/peerj-cs.2612. eCollection 2025.

Survey and perspective on verification, validation, and uncertainty quantification of digital twins for precision medicine.

NPJ Digit Med. 2025 Jan 17;8(1):40. doi: 10.1038/s41746-025-01447-y.

本文引用的文献

A Shallow Convolutional Neural Network Predicts Prognosis of Lung Cancer Patients in Multi-Institutional CT-Image Data.

Nat Mach Intell. 2020 May;2(5):274-282. doi: 10.1038/s42256-020-0173-6. Epub 2020 May 18.

Diagnosis and risk stratification in hypertrophic cardiomyopathy using machine learning wall thickness measurement: a comparison with human test-retest performance.

Lancet Digit Health. 2021 Jan;3(1):e20-e28. doi: 10.1016/S2589-7500(20)30267-3. Epub 2020 Dec 3.

Predicting the risk of developing diabetic retinopathy using deep learning.

Lancet Digit Health. 2021 Jan;3(1):e10-e19. doi: 10.1016/S2589-7500(20)30250-8. Epub 2020 Nov 26.

Temporal bias in case-control design: preventing reliable predictions of the future.

Nat Commun. 2021 Feb 17;12(1):1107. doi: 10.1038/s41467-021-21390-2.

An algorithmic approach to reducing unexplained pain disparities in underserved populations.

Nat Med. 2021 Jan;27(1):136-140. doi: 10.1038/s41591-020-01192-7. Epub 2021 Jan 13.

Dynamic ElecTronic hEalth reCord deTection (DETECT) of individuals at risk of a first episode of psychosis: a case-control development and validation study.

Lancet Digit Health. 2020 May;2(5):e229-e239. doi: 10.1016/S2589-7500(20)30024-8. Epub 2020 Mar 26.

Time to reality check the promises of machine learning-powered precision medicine.

Lancet Digit Health. 2020 Dec;2(12):e677-e680. doi: 10.1016/S2589-7500(20)30200-4. Epub 2020 Sep 16.

Evaluating the effect of demographic factors, socioeconomic factors, and risk aversion on mobility during the COVID-19 epidemic in France under lockdown: a population-based study.

Lancet Digit Health. 2020 Dec;2(12):e638-e649. doi: 10.1016/S2589-7500(20)30243-0. Epub 2020 Oct 28.

Dissecting racial bias in an algorithm used to manage the health of populations.

Science. 2019 Oct 25;366(6464):447-453. doi: 10.1126/science.aax2342.

Multitask learning and benchmarking with clinical time series data.

Sci Data. 2019 Jun 17;6(1):96. doi: 10.1038/s41597-019-0103-9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

针对代表性不足患者的亚群特异性机器学习预后分析及双重优先偏差校正

Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction.

作者信息

Afrose Sharmin, Song Wenjia, Nemeroff Charles B, Lu Chang, Yao Danfeng Daphne

机构信息

Department of Computer Science, Virginia Tech, Blacksburg, VA USA.

Department of Psychiatry and Behavioral Sciences, The University of Texas at Austin Dell Medical School, Austin, TX USA.

出版信息

Commun Med (Lond). 2022 Sep 1;2:111. doi: 10.1038/s43856-022-00165-w. eCollection 2022.

DOI:10.1038/s43856-022-00165-w

PMID:36059892

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9436942/

Abstract

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

摘要

针对代表性不足患者的亚群特异性机器学习预后分析及双重优先偏差校正

Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

针对代表性不足患者的亚群特异性机器学习预后分析及双重优先偏差校正

Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论