• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

视觉对话

Visual Dialog.

作者信息

Das Abhishek, Kottur Satwik, Gupta Khushi, Singh Avi, Yadav Deshraj, Lee Stefan, Moura Jose, Parikh Devi, Batra Dhruv

出版信息

IEEE Trans Pattern Anal Mach Intell. 2018 Apr 19. doi: 10.1109/TPAMI.2018.2828437.

DOI:10.1109/TPAMI.2018.2828437
PMID:29993628
Abstract

We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a question about the image, the agent has to ground the question in image, infer context from history, and answer the question accurately. Visual Dialog is disentangled enough from a specific downstream task so as to serve as a general test of machine intelligence, while being sufficiently grounded in vision to allow objective evaluation of individual responses and benchmark progress. We develop a novel two-person real-time chat data-collection protocol to curate a large-scale Visual Dialog dataset (VisDial). VisDial v0.9 has been released and consists of dialog question-answer pairs from 10-round, human-human dialogs grounded in images from the COCO dataset.

摘要

我们引入了视觉对话任务,该任务要求人工智能代理以自然的对话语言与人类就视觉内容进行有意义的对话。具体而言,给定一张图像、一段对话历史以及一个关于该图像的问题,代理必须将问题与图像关联起来,从历史中推断上下文,并准确回答问题。视觉对话与特定的下游任务足够解耦,从而可作为机器智能的通用测试,同时又充分基于视觉,以便对个体回答进行客观评估并衡量基准进展。我们开发了一种新颖的两人实时聊天数据收集协议,以构建一个大规模的视觉对话数据集(VisDial)。VisDial v0.9已经发布,它由基于COCO数据集中的图像进行的10轮人人对话的对话问答对组成。

相似文献

1
Visual Dialog.视觉对话
IEEE Trans Pattern Anal Mach Intell. 2018 Apr 19. doi: 10.1109/TPAMI.2018.2828437.
2
Context-Aware Graph Inference With Knowledge Distillation for Visual Dialog.基于知识蒸馏的上下文感知图推理的视觉对话
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):6056-6073. doi: 10.1109/TPAMI.2021.3085755. Epub 2022 Sep 14.
3
Saying the Unseen: Video Descriptions via Dialog Agents.言说未见之物:借助对话代理生成视频描述
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7190-7204. doi: 10.1109/TPAMI.2021.3093360. Epub 2022 Sep 14.
4
Semantic and pragmatic precision in conversational AI systems.对话式人工智能系统中的语义和语用精确性。
Front Artif Intell. 2023 Mar 30;6:896729. doi: 10.3389/frai.2023.896729. eCollection 2023.
5
NMN-VD: A Neural Module Network for Visual Dialog.NMN-VD:一种用于视觉对话的神经模块网络。
Sensors (Basel). 2021 Jan 30;21(3):931. doi: 10.3390/s21030931.
6
Knowledge graph assisted end-to-end medical dialog generation.知识图谱辅助的端到端医学对话生成
Artif Intell Med. 2023 May;139:102535. doi: 10.1016/j.artmed.2023.102535. Epub 2023 Mar 23.
7
Multitask Learning and Reinforcement Learning for Personalized Dialog Generation: An Empirical Study.用于个性化对话生成的多任务学习与强化学习:一项实证研究
IEEE Trans Neural Netw Learn Syst. 2021 Jan;32(1):49-62. doi: 10.1109/TNNLS.2020.2975035. Epub 2021 Jan 4.
8
Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool.逆向视觉问答:一个新的基准和 VQA 诊断工具。
IEEE Trans Pattern Anal Mach Intell. 2020 Feb;42(2):460-474. doi: 10.1109/TPAMI.2018.2880185. Epub 2018 Nov 9.
9
Dialog organization and functional communication in a medical assistance task by phone.电话医疗救助任务中的对话组织与功能沟通
Percept Mot Skills. 1995 Oct;81(2):451-61. doi: 10.1177/003151259508100218.
10
A Survey on Learning-Based Approaches for Modeling and Classification of Human-Machine Dialog Systems.基于学习的人机对话系统建模与分类方法研究综述。
IEEE Trans Neural Netw Learn Syst. 2021 Apr;32(4):1418-1432. doi: 10.1109/TNNLS.2020.2985588. Epub 2021 Apr 2.

引用本文的文献

1
A Review of Multi-Modal Learning from the Text-Guided Visual Processing Viewpoint.多模态学习综述——从文本指导的视觉处理视角。
Sensors (Basel). 2022 Sep 8;22(18):6816. doi: 10.3390/s22186816.
2
Joint Multimodal Embedding and Backtracking Search in Vision-and-Language Navigation.基于视觉语言导航的联合多模态嵌入与回溯搜索
Sensors (Basel). 2021 Feb 2;21(3):1012. doi: 10.3390/s21031012.
3
An Effective Dense Co-Attention Networks for Visual Question Answering.一种用于视觉问答的高效密集协同注意力网络。
Sensors (Basel). 2020 Aug 30;20(17):4897. doi: 10.3390/s20174897.