• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

To Boost Zero-Shot Generalization for Embodied Reasoning With Vision-Language Pre-Training.

作者信息

Su Ke, Zhang Xingxing, Zhang Siyang, Zhu Jun, Zhang Bo

出版信息

IEEE Trans Image Process. 2024;33:5370-5381. doi: 10.1109/TIP.2024.3459800. Epub 2024 Oct 2.

DOI:10.1109/TIP.2024.3459800
PMID:39292596
Abstract

Recently, there exists an increased research interest in embodied artificial intelligence (EAI), which involves an agent learning to perform a specific task when dynamically interacting with the surrounding 3D environment. There into, a new challenge is that many unseen objects may appear due to the increased number of object categories in 3D scenes. It makes developing models with strong zero-shot generalization ability to new objects necessary. Existing work tries to achieve this goal by providing embodied agents with massive high-quality human annotations closely related to the task to be learned, while it is too costly in practice. Inspired by recent advances in pre-trained models in 2D visual tasks, we attempt to boost zero-shot generalization for embodied reasoning with vision-language pre-training that can encode common sense as general prior knowledge. To further improve its performance on a specific task, we rectify the pre-trained representation through masked scene graph modeling (MSGM) in a self-supervised manner, where the task-specific knowledge is learned from iterative message passing. Our method can improve a variety of representative embodied reasoning tasks by a large margin (e.g., over 5.0% w.r.t. answer accuracy on MP3D-EQA dataset that consists of many real-world scenes with a large number of new objects during testing), and achieve the new state-of-the-art performance.

摘要

相似文献

1
To Boost Zero-Shot Generalization for Embodied Reasoning With Vision-Language Pre-Training.
IEEE Trans Image Process. 2024;33:5370-5381. doi: 10.1109/TIP.2024.3459800. Epub 2024 Oct 2.
2
Composite Object Relation Modeling for Few-Shot Scene Recognition.用于少样本场景识别的复合对象关系建模
IEEE Trans Image Process. 2023;32:5678-5691. doi: 10.1109/TIP.2023.3321475. Epub 2023 Oct 17.
3
Lowis3D: Language-Driven Open-World Instance-Level 3D Scene Understanding.Lowis3D:语言驱动的开放世界实例级3D场景理解
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8517-8533. doi: 10.1109/TPAMI.2024.3410324. Epub 2024 Nov 6.
4
Knowledge-Based Embodied Question Answering.
IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):11948-11960. doi: 10.1109/TPAMI.2023.3277206. Epub 2023 Sep 5.
5
PointGLR: Unsupervised Structural Representation Learning of 3D Point Clouds.PointGLR:三维点云的无监督结构表示学习
IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):2193-2207. doi: 10.1109/TPAMI.2022.3159794. Epub 2023 Jan 6.
6
Multi-view graph representation with similarity diffusion for general zero-shot learning.用于通用零样本学习的具有相似性扩散的多视图图表示
Neural Netw. 2023 Sep;166:38-50. doi: 10.1016/j.neunet.2023.06.045. Epub 2023 Jul 7.
7
Zero-shot visual reasoning through probabilistic analogical mapping.通过概率类比映射实现零样本视觉推理
Nat Commun. 2023 Aug 24;14(1):5144. doi: 10.1038/s41467-023-40804-x.
8
Mind Reasoning Manners: Enhancing Type Perception for Generalized Zero-Shot Logical Reasoning Over Text.
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):18499-18511. doi: 10.1109/TNNLS.2023.3317254. Epub 2024 Dec 2.
9
Vision-Language Navigation Policy Learning and Adaptation.视觉-语言导航策略学习与适应。
IEEE Trans Pattern Anal Mach Intell. 2021 Dec;43(12):4205-4216. doi: 10.1109/TPAMI.2020.2972281. Epub 2021 Nov 3.
10
Multi-label zero-shot learning with graph convolutional networks.基于图卷积网络的多标签零样本学习。
Neural Netw. 2020 Dec;132:333-341. doi: 10.1016/j.neunet.2020.09.010. Epub 2020 Sep 21.