用于教科书问答的弱监督学习

Weakly Supervised Learning for Textbook Question Answering.

作者信息

Ma Jie, Chai Qi, Huang Jingyue, Liu Jun, You Yang, Zheng Qinghua

出版信息

IEEE Trans Image Process. 2022;31:7378-7388. doi: 10.1109/TIP.2022.3180563. Epub 2022 Dec 1.

DOI:10.1109/TIP.2022.3180563

Abstract

Textbook Question Answering (TQA) is the task of answering diagram and non-diagram questions given large multi-modal contexts consisting of abundant text and diagrams. Deep text understandings and effective learning of diagram semantics are important for this task due to its specificity. In this paper, we propose a Weakly Supervised learning method for TQA (WSTQ), which regards the incompletely accurate results of essential intermediate procedures for this task as supervision to develop Text Matching (TM) and Relation Detection (RD) tasks and then employs the tasks to motivate itself to learn strong text comprehension and excellent diagram semantics respectively. Specifically, we apply the result of text retrieval to build positive as well as negative text pairs. In order to learn deep text understandings, we first pre-train the text understanding module of WSTQ on TM and then fine-tune it on TQA. We build positive as well as negative relation pairs by checking whether there is any overlap between the items/regions detected from diagrams using object detection. The RD task forces our method to learn the relationships between regions, which are crucial to express the diagram semantics. We train WSTQ on RD and TQA simultaneously, i.e., multitask learning, to obtain effective diagram semantics and then improve the TQA performance. Extensive experiments are carried out on CK12-QA and AI2D to verify the effectiveness of WSTQ. Experimental results show that our method achieves significant accuracy improvements of 5.02% and 4.12% on test splits of the above datasets respectively than the current state-of-the-art baseline. We have released our code on https://github.com/dr-majie/WSTQ.

摘要

教科书问答（TQA）是一项在由大量文本和图表组成的大型多模态语境下回答图表及非图表问题的任务。由于该任务的特殊性，深度文本理解和图表语义的有效学习对于此任务至关重要。在本文中，我们提出了一种用于TQA的弱监督学习方法（WSTQ），该方法将此任务基本中间过程的不完全准确结果视为监督，以开发文本匹配（TM）和关系检测（RD）任务，然后利用这些任务促使自身分别学习强大的文本理解能力和出色的图表语义。具体而言，我们应用文本检索结果来构建正、负文本对。为了学习深度文本理解，我们首先在TM上对WSTQ的文本理解模块进行预训练，然后在TQA上对其进行微调。我们通过检查使用目标检测从图表中检测到的项目/区域之间是否存在重叠来构建正、负关系对。RD任务迫使我们的方法学习区域之间的关系，这对于表达图表语义至关重要。我们在RD和TQA上同时训练WSTQ，即多任务学习，以获得有效的图表语义，进而提高TQA性能。我们在CK12-QA和AI2D上进行了大量实验，以验证WSTQ的有效性。实验结果表明，我们的方法在上述数据集的测试分割上分别比当前最先进的基线显著提高了5.02%和4.12%的准确率。我们已将代码发布在https://github.com/dr-majie/WSTQ上。

相似文献

Weakly Supervised Learning for Textbook Question Answering.用于教科书问答的弱监督学习

IEEE Trans Image Process. 2022;31:7378-7388. doi: 10.1109/TIP.2022.3180563. Epub 2022 Dec 1.

XTQA: Span-Level Explanations for Textbook Question Answering.XTQA：教科书问答的跨度级解释

IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):16493-16503. doi: 10.1109/TNNLS.2023.3294991. Epub 2024 Oct 29.

Relation-Aware Fine-Grained Reasoning Network for Textbook Question Answering.用于教科书问答的关系感知细粒度推理网络

IEEE Trans Neural Netw Learn Syst. 2023 Jan;34(1):15-27. doi: 10.1109/TNNLS.2021.3089140. Epub 2023 Jan 5.

Relation-Aware Heterogeneous Graph Network for Learning Intermodal Semantics in Textbook Question Answering.用于教科书问答中学习跨模态语义的关系感知异构图网络

IEEE Trans Neural Netw Learn Syst. 2024 Sep;35(9):11872-11883. doi: 10.1109/TNNLS.2024.3385436. Epub 2024 Sep 3.

DisAVR: Disentangled Adaptive Visual Reasoning Network for Diagram Question Answering.DisAVR：用于图表问答的解缠自适应视觉推理网络

IEEE Trans Image Process. 2023;32:4812-4827. doi: 10.1109/TIP.2023.3306910. Epub 2023 Aug 29.

Alignment Relation is What You Need for Diagram Parsing.对齐关系是图表解析所需的要素。

IEEE Trans Image Process. 2024;33:2131-2144. doi: 10.1109/TIP.2024.3374511. Epub 2024 Mar 18.

CapsTM: capsule network for Chinese medical text matching.CapsTM：用于中文医疗文本匹配的胶囊网络。

BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):94. doi: 10.1186/s12911-021-01442-9.

Fs-DSM: Few-Shot Diagram-Sentence Matching via Cross-Modal Attention Graph Model.Fs-DSM：通过跨模态注意力图模型实现的少样本图表-句子匹配

IEEE Trans Image Process. 2021;30:8102-8115. doi: 10.1109/TIP.2021.3112294. Epub 2021 Sep 27.

MCPL: Multi-Modal Collaborative Prompt Learning for Medical Vision-Language Model.MCPL：用于医学视觉语言模型的多模态协作提示学习

IEEE Trans Med Imaging. 2024 Dec;43(12):4224-4235. doi: 10.1109/TMI.2024.3418408. Epub 2024 Dec 2.

A Stacked BiLSTM Neural Network Based on Coattention Mechanism for Question Answering.基于注意力机制的堆叠 BiLSTM 神经网络问答方法。

Comput Intell Neurosci. 2019 Aug 21;2019:9543490. doi: 10.1155/2019/9543490. eCollection 2019.

用于教科书问答的弱监督学习

Weakly Supervised Learning for Textbook Question Answering.

作者信息

Ma Jie, Chai Qi, Huang Jingyue, Liu Jun, You Yang, Zheng Qinghua

出版信息

IEEE Trans Image Process. 2022;31:7378-7388. doi: 10.1109/TIP.2022.3180563. Epub 2022 Dec 1.

DOI:10.1109/TIP.2022.3180563

PMID:35687625

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于教科书问答的弱监督学习

Weakly Supervised Learning for Textbook Question Answering.

作者信息

出版信息

相似文献

用于教科书问答的弱监督学习

Weakly Supervised Learning for Textbook Question Answering.

作者信息

出版信息

相似文献