Buterez David, Janet Jon Paul, Oglic Dino, Liò Pietro
Department of Computer Science and Technology, University of Cambridge, Cambridge, UK.
Molecular AI, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden.
Nat Commun. 2025 Jun 5;16(1):5244. doi: 10.1038/s41467-025-60252-z.
There has been a recent surge in transformer-based architectures for learning on graphs, mainly motivated by attention as an effective learning mechanism and the desire to supersede the hand-crafted operators characteristic of message passing schemes. However, concerns over their empirical effectiveness, scalability, and complexity of the pre-processing steps have been raised, especially in relation to much simpler graph neural networks that typically perform on par with them across a wide range of benchmarks. To address these shortcomings, we consider graphs as sets of edges and propose a purely attention-based approach consisting of an encoder and an attention pooling mechanism. The encoder vertically interleaves masked and vanilla self-attention modules to learn an effective representation of edges while allowing for tackling possible misspecifications in input graphs. Despite its simplicity, the approach outperforms fine-tuned message passing baselines and recently proposed transformer-based methods on more than 70 node and graph-level tasks, including challenging long-range benchmarks. Moreover, we demonstrate state-of-the-art performance across different tasks, ranging from molecular to vision graphs, and heterophilous node classification. The approach also outperforms graph neural networks and transformers in transfer learning settings and scales much better than alternatives with a similar performance level or expressive power.
最近,基于变压器的架构在图学习方面激增,主要是受注意力作为一种有效学习机制的推动,以及取代消息传递方案中手工制作算子的愿望。然而,人们对它们的经验有效性、可扩展性和预处理步骤的复杂性提出了担忧,特别是与通常在广泛基准测试中与它们表现相当的简单得多的图神经网络相比。为了解决这些缺点,我们将图视为边的集合,并提出一种纯粹基于注意力的方法,该方法由一个编码器和一个注意力池化机制组成。编码器垂直交错掩码自注意力模块和普通自注意力模块,以学习边的有效表示,同时允许处理输入图中可能的错误指定。尽管该方法很简单,但在70多个节点和图级任务上,包括具有挑战性的远程基准测试,它优于微调后的消息传递基线和最近提出的基于变压器的方法。此外,我们展示了在从分子图到视觉图以及异质节点分类等不同任务中的最优性能。该方法在迁移学习设置中也优于图神经网络和变压器,并且比具有相似性能水平或表达能力的替代方法扩展性更好。