利用统计独立性进行表示学习以减轻偏差。

Representation Learning with Statistical Independence to Mitigate Bias.

作者信息

Adeli Ehsan, Zhao Qingyu, Pfefferbaum Adolf, Sullivan Edith V, Fei-Fei Li, Niebles Juan Carlos, Pohl Kilian M

机构信息

Department of Psychiatry and Behavioral Sciences, Stanford University, CA 94305.

Department of Computer Science, Stanford University, CA 94305.

出版信息

IEEE Winter Conf Appl Comput Vis. 2021 Jan;2021:2512-2522. doi: 10.1109/wacv48630.2021.00256. Epub 2021 Jun 14.

DOI:10.1109/wacv48630.2021.00256

PMID:34522832

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8436589/

Abstract

Presence of bias (in datasets or tasks) is inarguably one of the most critical challenges in machine learning applications that has alluded to pivotal debates in recent years. Such challenges range from spurious associations between variables in medical studies to the bias of race in gender or face recognition systems. Controlling for all types of biases in the dataset curation stage is cumbersome and sometimes impossible. The alternative is to use the available data and build models incorporating fair representation learning. In this paper, we propose such a model based on adversarial training with two competing objectives to learn features that have (1) maximum discriminative power with respect to the task and (2) minimal statistical mean dependence with the protected (bias) variable(s). Our approach does so by incorporating a new adversarial loss function that encourages a vanished correlation between the bias and the learned features. We apply our method to synthetic data, medical images (containing task bias), and a dataset for gender classification (containing dataset bias). Our results show that the learned features by our method not only result in superior prediction performance but also are unbiased.

摘要

（数据集中或任务中的）偏差的存在无疑是机器学习应用中最关键的挑战之一，近年来引发了关键辩论。此类挑战涵盖从医学研究中变量之间的虚假关联到性别或面部识别系统中的种族偏差等。在数据集整理阶段控制所有类型的偏差既繁琐又有时不可能。另一种方法是使用可用数据并构建包含公平表示学习的模型。在本文中，我们提出了这样一种基于对抗训练的模型，该模型具有两个相互竞争的目标，以学习具有以下特征的特征：（1）相对于任务具有最大判别力，（2）与受保护（偏差）变量的统计平均依赖性最小。我们的方法通过纳入一种新的对抗损失函数来实现这一点，该函数鼓励偏差与学习到的特征之间的相关性消失。我们将我们的方法应用于合成数据、医学图像（包含任务偏差）和一个性别分类数据集（包含数据集偏差）。我们的结果表明，我们的方法学习到的特征不仅带来卓越的预测性能，而且是无偏差的。

相似文献

Representation Learning with Statistical Independence to Mitigate Bias.

IEEE Winter Conf Appl Comput Vis. 2021 Jan;2021:2512-2522. doi: 10.1109/wacv48630.2021.00256. Epub 2021 Jun 14.

Learning Fair Representations via Distance Correlation Minimization.

IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2139-2152. doi: 10.1109/TNNLS.2022.3187165. Epub 2024 Feb 5.

Subgroup Invariant Perturbation for Unbiased Pre-Trained Model Prediction.

Front Big Data. 2021 Feb 18;3:590296. doi: 10.3389/fdata.2020.590296. eCollection 2020.

xDEEP-MSI: Explainable Bias-Rejecting Microsatellite Instability Deep Learning System in Colorectal Cancer.

Biomolecules. 2021 Nov 29;11(12):1786. doi: 10.3390/biom11121786.

Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.

J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.

Treatment effect prediction with adversarial deep learning using electronic health records.

BMC Med Inform Decis Mak. 2020 Dec 14;20(Suppl 4):139. doi: 10.1186/s12911-020-01151-9.

Fair Representation: Guaranteeing Approximate Multiple Group Fairness for Unknown Tasks.

IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):525-538. doi: 10.1109/TPAMI.2022.3148905. Epub 2022 Dec 5.

Parkinson's Disease Recognition Using Decorrelated Convolutional Neural Networks: Addressing Imbalance and Scanner Bias in rs-fMRI Data.

Biosensors (Basel). 2024 May 19;14(5):259. doi: 10.3390/bios14050259.

Learning brain representation using recurrent Wasserstein generative adversarial net.

Comput Methods Programs Biomed. 2022 Aug;223:106979. doi: 10.1016/j.cmpb.2022.106979. Epub 2022 Jun 27.

On learning disentangled representations for individual treatment effect estimation.

J Biomed Inform. 2021 Dec;124:103940. doi: 10.1016/j.jbi.2021.103940. Epub 2021 Oct 30.

引用本文的文献

Ethical considerations and robustness of artificial neural networks in medical image analysis under data corruption.

Sci Rep. 2025 Aug 11;15(1):29305. doi: 10.1038/s41598-025-15268-2.

A Method for Multimodal IVA Fusion Within a MISA Unified Model Reveals Markers of Age, Sex, Cognition, and Schizophrenia in Large Neuroimaging Studies.

Hum Brain Mapp. 2024 Dec 1;45(17):e70037. doi: 10.1002/hbm.70037.

Addressing fairness issues in deep learning-based medical image analysis: a systematic review.

NPJ Digit Med. 2024 Oct 17;7(1):286. doi: 10.1038/s41746-024-01276-5.

Novel multi-omics deconfounding variational autoencoders can obtain meaningful disease subtyping.

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae512.

Metadata-guided feature disentanglement for functional genomics.

Bioinformatics. 2024 Sep 1;40(Suppl 2):ii4-ii10. doi: 10.1093/bioinformatics/btae403.

Bias in artificial intelligence for medical imaging: fundamentals, detection, avoidance, mitigation, challenges, ethics, and prospects.

Diagn Interv Radiol. 2025 Mar 3;31(2):75-88. doi: 10.4274/dir.2024.242854. Epub 2024 Jul 2.

Towards objective and systematic evaluation of bias in artificial intelligence for medical imaging.

J Am Med Inform Assoc. 2024 Nov 1;31(11):2613-2621. doi: 10.1093/jamia/ocae165.

Drop the shortcuts: image augmentation improves fairness and decreases AI detection of race and other demographics from medical images.

EBioMedicine. 2024 Apr;102:105047. doi: 10.1016/j.ebiom.2024.105047. Epub 2024 Mar 11.

Subject Harmonization of Digital Biomarkers: Improved Detection of Mild Cognitive Impairment from Language Markers.

Pac Symp Biocomput. 2024;29:187-200.

Efficient adversarial debiasing with concept activation vector - Medical image case-studies.

J Biomed Inform. 2024 Jan;149:104548. doi: 10.1016/j.jbi.2023.104548. Epub 2023 Dec 1.

本文引用的文献

Training confounder-free deep learning models for medical applications.

Nat Commun. 2020 Nov 26;11(1):6010. doi: 10.1038/s41467-020-19784-9.

Chained regularization for identifying brain patterns specific to HIV infection.

Neuroimage. 2018 Dec;183:425-437. doi: 10.1016/j.neuroimage.2018.08.022. Epub 2018 Aug 21.

Age discrimination in healthcare institutions perceived by seniors and students.

Nurs Ethics. 2019 Mar;26(2):443-459. doi: 10.1177/0969733017718392. Epub 2017 Jul 26.

Increased brain-predicted aging in treated HIV disease.

Neurology. 2017 Apr 4;88(14):1349-1357. doi: 10.1212/WNL.0000000000003790. Epub 2017 Mar 3.

Predictive modelling using neuroimaging data in the presence of confounds.

Neuroimage. 2017 Apr 15;150:23-49. doi: 10.1016/j.neuroimage.2017.01.066. Epub 2017 Jan 29.

How to control confounding effects by statistical analysis.

Gastroenterol Hepatol Bed Bench. 2012 Spring;5(2):79-83.

The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations.

J Pers Soc Psychol. 1986 Dec;51(6):1173-82. doi: 10.1037//0022-3514.51.6.1173.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

利用统计独立性进行表示学习以减轻偏差。

Representation Learning with Statistical Independence to Mitigate Bias.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr超能文献

利用统计独立性进行表示学习以减轻偏差。

Representation Learning with Statistical Independence to Mitigate Bias.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr
超能文献