• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

NMN-VD:一种用于视觉对话的神经模块网络。

NMN-VD: A Neural Module Network for Visual Dialog.

机构信息

Department of Computer Science, Kyonggi University, Suwon 16227, Korea.

出版信息

Sensors (Basel). 2021 Jan 30;21(3):931. doi: 10.3390/s21030931.

DOI:10.3390/s21030931
PMID:33573265
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7866498/
Abstract

Visual dialog demonstrates several important aspects of multimodal artificial intelligence; however, it is hindered by visual grounding and visual coreference resolution problems. To overcome these problems, we propose the novel neural module network for visual dialog (NMN-VD). NMN-VD is an efficient question-customized modular network model that combines only the modules required for deciding answers after analyzing input questions. In particular, the model includes a module that effectively finds the visual area indicated by a pronoun using a reference pool to solve a visual coreference resolution problem, which is an important challenge in visual dialog. In addition, the proposed NMN-VD model includes a method for distinguishing and handling impersonal pronouns that do not require visual coreference resolution from general pronouns. Furthermore, a new module that effectively handles comparison questions found in visual dialogs is included in the model, as well as a module that applies a triple-attention mechanism to solve visual grounding problems between the question and the image. The results of various experiments conducted using a set of large-scale benchmark data verify the efficacy and high performance of our proposed NMN-VD model.

摘要

视觉对话展示了多模态人工智能的几个重要方面;然而,它受到视觉基础和视觉同指消解问题的阻碍。为了解决这些问题,我们提出了用于视觉对话的新型神经模块网络 (NMN-VD)。NMN-VD 是一种高效的问题定制模块网络模型,它在分析输入问题后,只结合决定答案所需的模块。特别是,该模型包括一个模块,该模块使用引用池有效地找到代词所指示的视觉区域,以解决视觉同指消解问题,这是视觉对话中的一个重要挑战。此外,所提出的 NMN-VD 模型还包括一种区分和处理不需要视觉同指消解的非人称代词与一般代词的方法。此外,该模型还包括一个新的模块,用于有效地处理视觉对话中发现的比较问题,以及一个应用三重注意力机制解决问题和图像之间的视觉基础问题的模块。使用一组大规模基准数据进行的各种实验结果验证了我们提出的 NMN-VD 模型的有效性和高性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/6cf64a83ea8f/sensors-21-00931-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/1466a4e52d4c/sensors-21-00931-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/2e2d45e6efd4/sensors-21-00931-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/462ec17ccb83/sensors-21-00931-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/fc5ddca94cd0/sensors-21-00931-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/93164d63e6b9/sensors-21-00931-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/931686fc2188/sensors-21-00931-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/fab226bbc7d1/sensors-21-00931-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/f0844c6b8cb6/sensors-21-00931-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/d2949bc9f439/sensors-21-00931-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/33576cd5a529/sensors-21-00931-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/fa14aec73ac1/sensors-21-00931-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/6cf64a83ea8f/sensors-21-00931-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/1466a4e52d4c/sensors-21-00931-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/2e2d45e6efd4/sensors-21-00931-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/462ec17ccb83/sensors-21-00931-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/fc5ddca94cd0/sensors-21-00931-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/93164d63e6b9/sensors-21-00931-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/931686fc2188/sensors-21-00931-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/fab226bbc7d1/sensors-21-00931-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/f0844c6b8cb6/sensors-21-00931-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/d2949bc9f439/sensors-21-00931-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/33576cd5a529/sensors-21-00931-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/fa14aec73ac1/sensors-21-00931-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f28/7866498/6cf64a83ea8f/sensors-21-00931-g012.jpg

相似文献

1
NMN-VD: A Neural Module Network for Visual Dialog.NMN-VD:一种用于视觉对话的神经模块网络。
Sensors (Basel). 2021 Jan 30;21(3):931. doi: 10.3390/s21030931.
2
Visual Dialog.视觉对话
IEEE Trans Pattern Anal Mach Intell. 2018 Apr 19. doi: 10.1109/TPAMI.2018.2828437.
3
Semantic-Aware Modular Capsule Routing for Visual Question Answering.用于视觉问答的语义感知模块化胶囊路由
IEEE Trans Image Process. 2023;32:5537-5549. doi: 10.1109/TIP.2023.3318949. Epub 2023 Oct 5.
4
Toward Accurate Visual Reasoning With Dual-Path Neural Module Networks.迈向基于双路径神经模块网络的精确视觉推理
Front Robot AI. 2020 Aug 21;7:109. doi: 10.3389/frobt.2020.00109. eCollection 2020.
5
Transformer Module Networks for Systematic Generalization in Visual Question Answering.用于视觉问答中系统泛化的Transformer模块网络
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):10096-10105. doi: 10.1109/TPAMI.2024.3438887. Epub 2024 Nov 7.
6
An Efficient Framework for Development of Task-Oriented Dialog Systems in a Smart Home Environment.面向智能家居环境中任务型对话系统开发的高效框架。
Sensors (Basel). 2018 May 16;18(5):1581. doi: 10.3390/s18051581.
7
Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text.生物共指消解评分系统(Bio-SCoRes):一种用于生物医学文本共指消解的混合架构
PLoS One. 2016 Mar 2;11(3):e0148538. doi: 10.1371/journal.pone.0148538. eCollection 2016.
8
Product Module Network Modeling and Evolution Analysis.产品模块网络建模与演化分析。
Comput Intell Neurosci. 2019 Mar 6;2019:2186916. doi: 10.1155/2019/2186916. eCollection 2019.
9
A categorical analysis of coreference resolution errors in biomedical texts.生物医学文本中指代消解错误的分类分析。
J Biomed Inform. 2016 Apr;60:309-18. doi: 10.1016/j.jbi.2016.02.015. Epub 2016 Feb 27.
10
Natural-Language-Driven Multimodal Representation Learning for Audio-Visual Scene-Aware Dialog System.用于视听场景感知对话系统的自然语言驱动多模态表示学习
Sensors (Basel). 2023 Sep 14;23(18):7875. doi: 10.3390/s23187875.

引用本文的文献

1
Neuro-symbolic procedural semantics for explainable visual dialogue.用于可解释视觉对话的神经符号过程语义学。
PLoS One. 2025 May 27;20(5):e0323098. doi: 10.1371/journal.pone.0323098. eCollection 2025.