• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

反事实双偏差视觉问答:一种用于稳健视觉问答的多模态去偏差学习方法

Counterfactual Dual-Bias VQA: A Multimodality Debias Learning for Robust Visual Question Answering.

作者信息

Wang Boyue, Ju Xiaoqian, Gao Junbin, Li Xiaoyan, Hu Yongli, Yin Baocai

出版信息

IEEE Trans Neural Netw Learn Syst. 2025 Sep;36(9):16366-16378. doi: 10.1109/TNNLS.2025.3562085.

DOI:10.1109/TNNLS.2025.3562085
PMID:40327476
Abstract

Visual question answering (VQA) models often face two language bias challenges. First, they tend to rely solely on the question to predict the answer, often overlooking relevant information in the accompanying images. Second, even when considering the question, they may focus only on the wh-words, neglecting other crucial keywords that could enhance interpretability and the question sensitivity. Existing debiasing methods attempt to address this by training a bias model using question-only inputs to enhance the robustness of the target VQA model. However, this approach may not fully capture the language bias present. In this article, we propose a multimodality counterfactual dual-bias model to mitigate the linguistic bias issue in target VQA models. Our approach involves designing a shared-parameterized dual-bias model that incorporates both visual and question counterfactual samples as inputs. By doing so, we aim to fully model language biases, with visual and question counterfactual samples, respectively, emphasizing important objects and keywords to relevant the answers. To ensure that our dual-bias model behaves similarly to an ordinary model, we freeze the parameters of the target VQA model, meanwhile using the cross-entropy and Kullback-Leibler (KL) divergence as the loss function to train the dual-bias model. Subsequently, to mitigate language bias in the target VQA model, we freeze the parameters of the dual-bias model to generate pseudo-labels and then incorporate a margin loss to re-train the target VQA model. Experimental results on the VQA-CP datasets demonstrate the superior effectiveness of our proposed counterfactual dual-bias model. Additionally, we conduct an analysis of the unsatisfactory performance on the VQA v2 dataset. The origin code of the proposed model is available at https://github.com/Arrow2022jv/MCD.

摘要

视觉问答(VQA)模型常常面临两种语言偏差挑战。首先,它们往往仅依赖问题来预测答案,常常忽略了附带图像中的相关信息。其次,即使考虑问题时,它们可能只关注疑问词,而忽略其他可能增强可解释性和问题敏感性的关键关键词。现有的去偏方法试图通过使用仅问题输入来训练偏差模型,以增强目标VQA模型的鲁棒性来解决这个问题。然而,这种方法可能无法完全捕捉到存在的语言偏差。在本文中,我们提出了一种多模态反事实双偏差模型,以减轻目标VQA模型中的语言偏差问题。我们的方法包括设计一个共享参数化的双偏差模型,该模型将视觉和问题反事实样本都作为输入。通过这样做,我们旨在分别利用视觉和问题反事实样本充分建模语言偏差,强调与答案相关的重要对象和关键词。为确保我们的双偏差模型表现得与普通模型类似,我们冻结目标VQA模型的参数,同时使用交叉熵和库尔贝克-莱布勒(KL)散度作为损失函数来训练双偏差模型。随后,为减轻目标VQA模型中的语言偏差,我们冻结双偏差模型的参数以生成伪标签,然后纳入边际损失来重新训练目标VQA模型。在VQA-CP数据集上的实验结果证明了我们提出的反事实双偏差模型的卓越有效性。此外,我们对VQA v2数据集上不尽人意的性能进行了分析。所提出模型的原始代码可在https://github.com/Arrow2022jv/MCD获取。

相似文献

1
Counterfactual Dual-Bias VQA: A Multimodality Debias Learning for Robust Visual Question Answering.反事实双偏差视觉问答:一种用于稳健视觉问答的多模态去偏差学习方法
IEEE Trans Neural Netw Learn Syst. 2025 Sep;36(9):16366-16378. doi: 10.1109/TNNLS.2025.3562085.
2
Cross-Modal self-supervised vision language pre-training with multiple objectives for medical visual question answering.用于医学视觉问答的多目标跨模态自监督视觉语言预训练
J Biomed Inform. 2024 Dec;160:104748. doi: 10.1016/j.jbi.2024.104748. Epub 2024 Nov 12.
3
MSB-VQA: Overcoming multiple source biases for robust visual question answering.MSB-VQA:克服多种源偏差以实现稳健的视觉问答
Neural Netw. 2025 Jul 25;192:107908. doi: 10.1016/j.neunet.2025.107908.
4
Prophet: Prompting Large Language Models With Complementary Answer Heuristics for Knowledge-Based Visual Question Answering.Prophet:通过互补答案启发式方法提示大语言模型以进行基于知识的视觉问答
IEEE Trans Pattern Anal Mach Intell. 2025 Aug;47(8):6797-6808. doi: 10.1109/TPAMI.2025.3562422.
5
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
6
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
7
Sexual Harassment and Prevention Training性骚扰与预防培训
8
Stigma Management Strategies of Autistic Social Media Users.自闭症社交媒体用户的污名管理策略
Autism Adulthood. 2025 May 28;7(3):273-282. doi: 10.1089/aut.2023.0095. eCollection 2025 Jun.
9
Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials.与随机试验中评估的医疗保健结果相比,观察性研究设计评估的医疗保健结果。
Cochrane Database Syst Rev. 2014 Apr 29;2014(4):MR000034. doi: 10.1002/14651858.MR000034.pub2.
10
PathBench: Advancing the Benchmark of Large Multimodal Models for Pathology Image Understanding at Patch and Whole Slide Level.PathBench:提升用于病理图像理解的大型多模态模型在切片和全切片水平上的基准。
IEEE Trans Med Imaging. 2025 Jul 2;PP. doi: 10.1109/TMI.2025.3584857.

引用本文的文献

1
LLM-Enhanced Chinese Morph Resolution in E-Commerce Live Streaming Scenarios.电子商务直播场景中基于大语言模型增强的中文形态解析
Entropy (Basel). 2025 Jun 29;27(7):698. doi: 10.3390/e27070698.