计算机视觉中基于任务导向视角的图神经网络与图变换器综述。

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective.

作者信息

Chen Chaoqi, Wu Yushuang, Dai Qiyuan, Zhou Hong-Yu, Xu Mutian, Yang Sibei, Han Xiaoguang, Yu Yizhou

出版信息

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):10297-10318. doi: 10.1109/TPAMI.2024.3445463. Epub 2024 Nov 6.

DOI:10.1109/TPAMI.2024.3445463

Abstract

Graph Neural Networks (GNNs) have gained momentum in graph representation learning and boosted the state of the art in a variety of areas, such as data mining (e.g., social network analysis and recommender systems), computer vision (e.g., object detection and point cloud learning), and natural language processing (e.g., relation extraction and sequence learning), to name a few. With the emergence of Transformers in natural language processing and computer vision, graph Transformers embed a graph structure into the Transformer architecture to overcome the limitations of local neighborhood aggregation while avoiding strict structural inductive biases. In this paper, we present a comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective. Specifically, we divide their applications in computer vision into five categories according to the modality of input data, i.e., 2D natural images, videos, 3D data, vision + language, and medical images. In each category, we further divide the applications according to a set of vision tasks. Such a task-oriented taxonomy allows us to examine how each task is tackled by different GNN-based approaches and how well these approaches perform. Based on the necessary preliminaries, we provide the definitions and challenges of the tasks, in-depth coverage of the representative approaches, as well as discussions regarding insights, limitations, and future directions.

摘要

图神经网络（GNN）在图表示学习中获得了发展动力，并在多个领域提升了技术水平，例如数据挖掘（如社交网络分析和推荐系统）、计算机视觉（如目标检测和点云学习）以及自然语言处理（如关系抽取和序列学习）等等。随着自然语言处理和计算机视觉中Transformer的出现，图Transformer将图结构嵌入到Transformer架构中，以克服局部邻域聚合的局限性，同时避免严格的结构归纳偏差。在本文中，我们从面向任务的角度对计算机视觉中的GNN和图Transformer进行全面综述。具体而言，我们根据输入数据的模态将它们在计算机视觉中的应用分为五类，即二维自然图像、视频、三维数据、视觉+语言和医学图像。在每一类中，我们根据一组视觉任务进一步划分应用。这种面向任务的分类法使我们能够研究不同的基于GNN的方法如何处理每个任务以及这些方法的表现如何。基于必要的预备知识，我们给出任务的定义和挑战，深入介绍代表性方法，并讨论相关见解、局限性和未来方向。

相似文献

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective.计算机视觉中基于任务导向视角的图神经网络与图变换器综述。

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):10297-10318. doi: 10.1109/TPAMI.2024.3445463. Epub 2024 Nov 6.

GTC: GNN-Transformer co-contrastive learning for self-supervised heterogeneous graph representation.GTC：用于自监督异构图表示的GNN-Transformer协同对比学习

Neural Netw. 2025 Jan;181:106645. doi: 10.1016/j.neunet.2024.106645. Epub 2024 Aug 16.

A Comprehensive Survey on Graph Neural Networks.图神经网络综述。

IEEE Trans Neural Netw Learn Syst. 2021 Jan;32(1):4-24. doi: 10.1109/TNNLS.2020.2978386. Epub 2021 Jan 4.

Do it the transformer way: A comprehensive review of brain and vision transformers for autism spectrum disorder diagnosis and classification.采用变压器方法：自闭症谱系障碍诊断和分类的脑和视觉变压器的全面综述。

Comput Biol Med. 2023 Dec;167:107667. doi: 10.1016/j.compbiomed.2023.107667. Epub 2023 Nov 3.

A Survey on Graph Neural Networks for Microservice-Based Cloud Applications.基于图神经网络的微服务云应用研究综述。

Sensors (Basel). 2022 Dec 5;22(23):9492. doi: 10.3390/s22239492.

Everything is connected: Graph neural networks.万物皆相连：图神经网络。

Curr Opin Struct Biol. 2023 Apr;79:102538. doi: 10.1016/j.sbi.2023.102538. Epub 2023 Feb 9.

Graph Convolutional Network for 3D Object Pose Estimation in a Point Cloud.图卷积网络在点云中进行 3D 物体位姿估计。

Sensors (Basel). 2022 Oct 25;22(21):8166. doi: 10.3390/s22218166.

A Survey on Vision Transformer.视觉Transformer综述

IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):87-110. doi: 10.1109/TPAMI.2022.3152247. Epub 2022 Dec 5.

Automatic Design of Deep Graph Neural Networks With Decoupled Mode.具有解耦模式的深度图神经网络自动设计

IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):7918-7930. doi: 10.1109/TNNLS.2024.3438609. Epub 2025 May 2.

Transformers in medical imaging: A survey.医学成像中的变压器：综述。

Med Image Anal. 2023 Aug;88:102802. doi: 10.1016/j.media.2023.102802. Epub 2023 Apr 5.

引用本文的文献

TFF-Net: A Feature Fusion Graph Neural Network-Based Vehicle Type Recognition Approach for Low-Light Conditions.TFF-Net：一种基于特征融合图神经网络的低光照条件下车辆类型识别方法

Sensors (Basel). 2025 Jun 9;25(12):3613. doi: 10.3390/s25123613.

Machine learning of automatic hierarchical multi-label classification method for identifying metal failure mechanisms.用于识别金属失效机制的自动分层多标签分类方法的机器学习

Sci Rep. 2025 Jun 6;15(1):19904. doi: 10.1038/s41598-025-05076-z.

Privacy-Preserving Graph Machine Learning from Data to Computation: A Survey.从数据到计算的隐私保护图机器学习：一项综述。

SIGKDD Explor. 2023 Jul 5;25(1):54-72. doi: 10.1145/3606274.3606280.

MAF-Net: A multimodal data fusion approach for human action recognition.MAF-Net：一种用于人类动作识别的多模态数据融合方法。

PLoS One. 2025 Apr 9;20(4):e0319656. doi: 10.1371/journal.pone.0319656. eCollection 2025.

DiffMC-Gen: A Dual Denoising Diffusion Model for Multi-Conditional Molecular Generation.DiffMC-Gen：用于多条件分子生成的双去噪扩散模型。

Adv Sci (Weinh). 2025 Jun;12(22):e2417726. doi: 10.1002/advs.202417726. Epub 2025 Apr 1.

Enhancing the Transformer Model with a Convolutional Feature Extractor Block and Vector-Based Relative Position Embedding for Human Activity Recognition.利用卷积特征提取模块和基于向量的相对位置嵌入增强Transformer模型用于人类活动识别

Sensors (Basel). 2025 Jan 7;25(2):301. doi: 10.3390/s25020301.

The Neural Frontier of Future Medical Imaging: A Review of Deep Learning for Brain Tumor Detection.未来医学成像的神经前沿：深度学习在脑肿瘤检测中的应用综述

J Imaging. 2024 Dec 24;11(1):2. doi: 10.3390/jimaging11010002.

Deep Learning for 3D Reconstruction, Augmentation, and Registration: A Review Paper.用于3D重建、增强和配准的深度学习：一篇综述论文。

Entropy (Basel). 2024 Mar 7;26(3):235. doi: 10.3390/e26030235.

Data mining-based recommendation system using social networks-an analytical study.基于数据挖掘的社交网络推荐系统——一项分析研究。

PeerJ Comput Sci. 2023 Feb 8;9:e1202. doi: 10.7717/peerj-cs.1202. eCollection 2023.

Attention-Based Graph Neural Network for Label Propagation in Single-Cell Omics.基于注意力的图神经网络在单细胞组学中的标签传播。

Genes (Basel). 2023 Feb 16;14(2):506. doi: 10.3390/genes14020506.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

计算机视觉中基于任务导向视角的图神经网络与图变换器综述。

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective.

作者信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献