• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于教科书问答的弱监督学习

Weakly Supervised Learning for Textbook Question Answering.

作者信息

Ma Jie, Chai Qi, Huang Jingyue, Liu Jun, You Yang, Zheng Qinghua

出版信息

IEEE Trans Image Process. 2022;31:7378-7388. doi: 10.1109/TIP.2022.3180563. Epub 2022 Dec 1.

DOI:10.1109/TIP.2022.3180563
PMID:35687625
Abstract

Textbook Question Answering (TQA) is the task of answering diagram and non-diagram questions given large multi-modal contexts consisting of abundant text and diagrams. Deep text understandings and effective learning of diagram semantics are important for this task due to its specificity. In this paper, we propose a Weakly Supervised learning method for TQA (WSTQ), which regards the incompletely accurate results of essential intermediate procedures for this task as supervision to develop Text Matching (TM) and Relation Detection (RD) tasks and then employs the tasks to motivate itself to learn strong text comprehension and excellent diagram semantics respectively. Specifically, we apply the result of text retrieval to build positive as well as negative text pairs. In order to learn deep text understandings, we first pre-train the text understanding module of WSTQ on TM and then fine-tune it on TQA. We build positive as well as negative relation pairs by checking whether there is any overlap between the items/regions detected from diagrams using object detection. The RD task forces our method to learn the relationships between regions, which are crucial to express the diagram semantics. We train WSTQ on RD and TQA simultaneously, i.e., multitask learning, to obtain effective diagram semantics and then improve the TQA performance. Extensive experiments are carried out on CK12-QA and AI2D to verify the effectiveness of WSTQ. Experimental results show that our method achieves significant accuracy improvements of 5.02% and 4.12% on test splits of the above datasets respectively than the current state-of-the-art baseline. We have released our code on https://github.com/dr-majie/WSTQ.

摘要

教科书问答(TQA)是一项在由大量文本和图表组成的大型多模态语境下回答图表及非图表问题的任务。由于该任务的特殊性,深度文本理解和图表语义的有效学习对于此任务至关重要。在本文中,我们提出了一种用于TQA的弱监督学习方法(WSTQ),该方法将此任务基本中间过程的不完全准确结果视为监督,以开发文本匹配(TM)和关系检测(RD)任务,然后利用这些任务促使自身分别学习强大的文本理解能力和出色的图表语义。具体而言,我们应用文本检索结果来构建正、负文本对。为了学习深度文本理解,我们首先在TM上对WSTQ的文本理解模块进行预训练,然后在TQA上对其进行微调。我们通过检查使用目标检测从图表中检测到的项目/区域之间是否存在重叠来构建正、负关系对。RD任务迫使我们的方法学习区域之间的关系,这对于表达图表语义至关重要。我们在RD和TQA上同时训练WSTQ,即多任务学习,以获得有效的图表语义,进而提高TQA性能。我们在CK12-QA和AI2D上进行了大量实验,以验证WSTQ的有效性。实验结果表明,我们的方法在上述数据集的测试分割上分别比当前最先进的基线显著提高了5.02%和4.12%的准确率。我们已将代码发布在https://github.com/dr-majie/WSTQ上。

相似文献

1
Weakly Supervised Learning for Textbook Question Answering.用于教科书问答的弱监督学习
IEEE Trans Image Process. 2022;31:7378-7388. doi: 10.1109/TIP.2022.3180563. Epub 2022 Dec 1.
2
XTQA: Span-Level Explanations for Textbook Question Answering.XTQA:教科书问答的跨度级解释
IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):16493-16503. doi: 10.1109/TNNLS.2023.3294991. Epub 2024 Oct 29.
3
Relation-Aware Fine-Grained Reasoning Network for Textbook Question Answering.用于教科书问答的关系感知细粒度推理网络
IEEE Trans Neural Netw Learn Syst. 2023 Jan;34(1):15-27. doi: 10.1109/TNNLS.2021.3089140. Epub 2023 Jan 5.
4
Relation-Aware Heterogeneous Graph Network for Learning Intermodal Semantics in Textbook Question Answering.用于教科书问答中学习跨模态语义的关系感知异构图网络
IEEE Trans Neural Netw Learn Syst. 2024 Sep;35(9):11872-11883. doi: 10.1109/TNNLS.2024.3385436. Epub 2024 Sep 3.
5
DisAVR: Disentangled Adaptive Visual Reasoning Network for Diagram Question Answering.DisAVR:用于图表问答的解缠自适应视觉推理网络
IEEE Trans Image Process. 2023;32:4812-4827. doi: 10.1109/TIP.2023.3306910. Epub 2023 Aug 29.
6
Alignment Relation is What You Need for Diagram Parsing.对齐关系是图表解析所需的要素。
IEEE Trans Image Process. 2024;33:2131-2144. doi: 10.1109/TIP.2024.3374511. Epub 2024 Mar 18.
7
CapsTM: capsule network for Chinese medical text matching.CapsTM:用于中文医疗文本匹配的胶囊网络。
BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):94. doi: 10.1186/s12911-021-01442-9.
8
Fs-DSM: Few-Shot Diagram-Sentence Matching via Cross-Modal Attention Graph Model.Fs-DSM:通过跨模态注意力图模型实现的少样本图表-句子匹配
IEEE Trans Image Process. 2021;30:8102-8115. doi: 10.1109/TIP.2021.3112294. Epub 2021 Sep 27.
9
MCPL: Multi-Modal Collaborative Prompt Learning for Medical Vision-Language Model.MCPL:用于医学视觉语言模型的多模态协作提示学习
IEEE Trans Med Imaging. 2024 Dec;43(12):4224-4235. doi: 10.1109/TMI.2024.3418408. Epub 2024 Dec 2.
10
A Stacked BiLSTM Neural Network Based on Coattention Mechanism for Question Answering.基于注意力机制的堆叠 BiLSTM 神经网络问答方法。
Comput Intell Neurosci. 2019 Aug 21;2019:9543490. doi: 10.1155/2019/9543490. eCollection 2019.