• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于先验知识和对象敏感学习的视频问答

Video Question Answering With Prior Knowledge and Object-Sensitive Learning.

出版信息

IEEE Trans Image Process. 2022;31:5936-5948. doi: 10.1109/TIP.2022.3205212. Epub 2022 Sep 15.

DOI:10.1109/TIP.2022.3205212
PMID:36083958
Abstract

Video Question Answering (VideoQA), which explores spatial-temporal visual information of videos given a linguistic query, has received unprecedented attention over recent years. One of the main challenges lies in locating relevant visual and linguistic information, and therefore various attention-based approaches are proposed. Despite the impressive progress, two aspects are not fully explored by current methods to get proper attention. Firstly, prior knowledge, which in the human cognitive process plays an important role in assisting the reasoning process of VideoQA, is not fully utilized. Secondly, structured visual information (e.g., object) instead of the raw video is underestimated. To address the above two issues, we propose a Prior Knowledge and Object-sensitive Learning (PKOL) by exploring the effect of prior knowledge and learning object-sensitive representations to boost the VideoQA task. Specifically, we first propose a Prior Knowledge Exploring (PKE) module that aims to acquire and integrate prior knowledge into a question feature for feature enriching, where an information retriever is constructed to retrieve related sentences as prior knowledge from the massive corpus. In addition, we propose an Object-sensitive Representation Learning (ORL) module to generate object-sensitive features by interacting object-level features with frame and clip-level features. Our proposed PKOL achieves consistent improvements on three competitive benchmarks (i.e., MSVD-QA, MSRVTT-QA, and TGIF-QA) and gains state-of-the-art performance. The source code is available at https://github.com/zchoi/PKOL.

摘要

视频问答(VideoQA)探索了给定语言查询的视频的时空视觉信息,近年来受到了前所未有的关注。主要挑战之一在于定位相关的视觉和语言信息,因此提出了各种基于注意力的方法。尽管取得了令人瞩目的进展,但当前方法并没有充分探索两个方面,以获得适当的关注。首先,人类认知过程中起着重要作用、有助于视频问答推理过程的先验知识没有得到充分利用。其次,结构化视觉信息(例如,对象)而不是原始视频被低估了。为了解决上述两个问题,我们通过探索先验知识和学习对象敏感表示的效果来提出先验知识和对象敏感学习(PKOL),以提高视频问答任务的性能。具体来说,我们首先提出了一个先验知识探索(PKE)模块,旨在获取和整合先验知识到问题特征中,以进行特征丰富,其中构建了一个信息检索器,从大规模语料库中检索相关句子作为先验知识。此外,我们提出了一个对象敏感表示学习(ORL)模块,通过与对象级特征交互来生成对象敏感特征与帧和片段级特征。我们提出的 PKOL 在三个具有竞争力的基准(即 MSVD-QA、MSRVTT-QA 和 TGIF-QA)上取得了一致的改进,并获得了最先进的性能。代码可在 https://github.com/zchoi/PKOL 上获得。

相似文献

1
Video Question Answering With Prior Knowledge and Object-Sensitive Learning.基于先验知识和对象敏感学习的视频问答
IEEE Trans Image Process. 2022;31:5936-5948. doi: 10.1109/TIP.2022.3205212. Epub 2022 Sep 15.
2
Compositional Attention Networks with Two-Stream Fusion for Video Question Answering.用于视频问答的双流融合组合注意力网络。
IEEE Trans Image Process. 2019 Sep 16. doi: 10.1109/TIP.2019.2940677.
3
Visual Commonsense-Aware Representation Network for Video Captioning.用于视频字幕的视觉常识感知表示网络。
IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):1092-1103. doi: 10.1109/TNNLS.2023.3323491. Epub 2025 Jan 7.
4
Learning to Answer Visual Questions From Web Videos.学习从网络视频中回答视觉问题。
IEEE Trans Pattern Anal Mach Intell. 2025 May;47(5):3202-3218. doi: 10.1109/TPAMI.2022.3173208. Epub 2025 Apr 8.
5
Dynamic Spatio-Temporal Graph Reasoning for VideoQA With Self-Supervised Event Recognition.基于自监督事件识别的视频问答动态时空图推理
IEEE Trans Image Process. 2024;33:4145-4158. doi: 10.1109/TIP.2024.3411448. Epub 2024 Jul 9.
6
A multi-scale self-supervised hypergraph contrastive learning framework for video question answering.一种用于视频问答的多尺度自监督超图对比学习框架。
Neural Netw. 2023 Nov;168:272-286. doi: 10.1016/j.neunet.2023.08.057. Epub 2023 Sep 16.
7
Event Graph Guided Compositional Spatial-Temporal Reasoning for Video Question Answering.用于视频问答的事件图引导的组合式时空推理
IEEE Trans Image Process. 2024;33:1109-1121. doi: 10.1109/TIP.2024.3358726. Epub 2024 Feb 5.
8
Memory Augmented Deep Recurrent Neural Network for Video Question Answering.记忆增强深度循环神经网络视频问答。
IEEE Trans Neural Netw Learn Syst. 2020 Sep;31(9):3159-3167. doi: 10.1109/TNNLS.2019.2938015. Epub 2019 Sep 20.
9
Transformer-Empowered Invariant Grounding for Video Question Answering.用于视频问答的基于Transformer的不变接地
IEEE Trans Pattern Anal Mach Intell. 2023 Aug 9;PP. doi: 10.1109/TPAMI.2023.3303451.
10
Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation.用于无监督视频对象分割的运动和时间线索学习
IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):9084-9097. doi: 10.1109/TNNLS.2024.3418980. Epub 2025 May 2.