Suppr超能文献

计算机视觉中基于任务导向视角的图神经网络与图变换器综述。

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective.

作者信息

Chen Chaoqi, Wu Yushuang, Dai Qiyuan, Zhou Hong-Yu, Xu Mutian, Yang Sibei, Han Xiaoguang, Yu Yizhou

出版信息

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):10297-10318. doi: 10.1109/TPAMI.2024.3445463. Epub 2024 Nov 6.

Abstract

Graph Neural Networks (GNNs) have gained momentum in graph representation learning and boosted the state of the art in a variety of areas, such as data mining (e.g., social network analysis and recommender systems), computer vision (e.g., object detection and point cloud learning), and natural language processing (e.g., relation extraction and sequence learning), to name a few. With the emergence of Transformers in natural language processing and computer vision, graph Transformers embed a graph structure into the Transformer architecture to overcome the limitations of local neighborhood aggregation while avoiding strict structural inductive biases. In this paper, we present a comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective. Specifically, we divide their applications in computer vision into five categories according to the modality of input data, i.e., 2D natural images, videos, 3D data, vision + language, and medical images. In each category, we further divide the applications according to a set of vision tasks. Such a task-oriented taxonomy allows us to examine how each task is tackled by different GNN-based approaches and how well these approaches perform. Based on the necessary preliminaries, we provide the definitions and challenges of the tasks, in-depth coverage of the representative approaches, as well as discussions regarding insights, limitations, and future directions.

摘要

图神经网络(GNN)在图表示学习中获得了发展动力,并在多个领域提升了技术水平,例如数据挖掘(如社交网络分析和推荐系统)、计算机视觉(如目标检测和点云学习)以及自然语言处理(如关系抽取和序列学习)等等。随着自然语言处理和计算机视觉中Transformer的出现,图Transformer将图结构嵌入到Transformer架构中,以克服局部邻域聚合的局限性,同时避免严格的结构归纳偏差。在本文中,我们从面向任务的角度对计算机视觉中的GNN和图Transformer进行全面综述。具体而言,我们根据输入数据的模态将它们在计算机视觉中的应用分为五类,即二维自然图像、视频、三维数据、视觉+语言和医学图像。在每一类中,我们根据一组视觉任务进一步划分应用。这种面向任务的分类法使我们能够研究不同的基于GNN的方法如何处理每个任务以及这些方法的表现如何。基于必要的预备知识,我们给出任务的定义和挑战,深入介绍代表性方法,并讨论相关见解、局限性和未来方向。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验