IEEE Trans Neural Netw Learn Syst. 2022 Oct;33(10):5150-5161. doi: 10.1109/TNNLS.2021.3069230. Epub 2022 Oct 5.
Learning meaningful representations of free-hand sketches remains a challenging task given the signal sparsity and the high-level abstraction of sketches. Existing techniques have focused on exploiting either the static nature of sketches with convolutional neural networks (CNNs) or the temporal sequential property with recurrent neural networks (RNNs). In this work, we propose a new representation of sketches as multiple sparsely connected graphs. We design a novel graph neural network (GNN), the multigraph transformer (MGT), for learning representations of sketches from multiple graphs, which simultaneously capture global and local geometric stroke structures as well as temporal information. We report extensive numerical experiments on a sketch recognition task to demonstrate the performance of the proposed approach. Particularly, MGT applied on 414k sketches from Google QuickDraw: 1) achieves a small recognition gap to the CNN-based performance upper bound (72.80% versus 74.22%) and infers faster than the CNN competitors and 2) outperforms all RNN-based models by a significant margin. To the best of our knowledge, this is the first work proposing to represent sketches as graphs and apply GNNs for sketch recognition. Code and trained models are available at https://github.com/PengBoXiangShang/multigraph_transformer.
鉴于草图信号稀疏且高度抽象,学习自由手绘草图的有意义表示仍然是一项具有挑战性的任务。现有的技术主要集中在利用卷积神经网络(CNN)的草图静态特性或循环神经网络(RNN)的时间序列特性。在这项工作中,我们将草图表示为多个稀疏连接的图。我们设计了一种新的图神经网络(GNN),即多图转换器(MGT),用于从多个图中学习草图表示,同时捕获全局和局部几何笔划结构以及时间信息。我们在草图识别任务上进行了广泛的数值实验,以证明所提出方法的性能。特别是,MGT 应用于来自 Google QuickDraw 的 414k 个草图:1)与基于 CNN 的性能上限(72.80%对 74.22%)相比,识别差距很小,并且推断速度比 CNN 竞争对手快,2)明显优于所有基于 RNN 的模型。据我们所知,这是首次提出将草图表示为图并应用 GNN 进行草图识别的工作。代码和训练模型可在 https://github.com/PengBoXiangShang/multigraph_transformer 上获得。