Dimiccoli Mariella, Wendt Herwig
IEEE Trans Image Process. 2021;30:1476-1486. doi: 10.1109/TIP.2020.3044448. Epub 2020 Dec 31.
Recently, self-supervised learning has proved to be effective to learn representations of events suitable for temporal segmentation in image sequences, where events are understood as sets of temporally adjacent images that are semantically perceived as a whole. However, although this approach does not require expensive manual annotations, it is data hungry and suffers from domain adaptation problems. As an alternative, in this work, we propose a novel approach for learning event representations named Dynamic Graph Embedding (DGE). The assumption underlying our model is that a sequence of images can be represented by a graph that encodes both semantic and temporal similarity. The key novelty of DGE is to learn jointly the graph and its graph embedding. At its core, DGE works by iterating over two steps: 1) updating the graph representing the semantic and temporal similarity of the data based on the current data representation, and 2) updating the data representation to take into account the current data graph structure. The main advantage of DGE over state-of-the-art self-supervised approaches is that it does not require any training set, but instead learns iteratively from the data itself a low-dimensional embedding that reflects their temporal and semantic similarity. Experimental results on two benchmark datasets of real image sequences captured at regular time intervals demonstrate that the proposed DGE leads to event representations effective for temporal segmentation. In particular, it achieves robust temporal segmentation on the EDUBSeg and EDUBSeg-Desc benchmark datasets, outperforming the state of the art. Additional experiments on two Human Motion Segmentation benchmark datasets demonstrate the generalization capabilities of the proposed DGE.
最近,自监督学习已被证明在学习适用于图像序列中时间分割的事件表示方面是有效的,其中事件被理解为在时间上相邻且在语义上被视为一个整体的图像集合。然而,尽管这种方法不需要昂贵的人工标注,但它对数据要求很高,并且存在领域适应问题。作为一种替代方法,在这项工作中,我们提出了一种名为动态图嵌入(DGE)的学习事件表示的新方法。我们模型的基本假设是,图像序列可以由一个编码语义和时间相似性的图来表示。DGE的关键新颖之处在于联合学习图及其图嵌入。其核心是,DGE通过迭代两个步骤来工作:1)基于当前的数据表示更新表示数据语义和时间相似性的图,2)更新数据表示以考虑当前的数据图结构。DGE相对于现有自监督方法的主要优势在于它不需要任何训练集,而是从数据本身迭代学习一个反映其时间和语义相似性的低维嵌入。在以固定时间间隔捕获的真实图像序列的两个基准数据集上的实验结果表明,所提出的DGE能够生成对时间分割有效的事件表示。特别是,它在EDUBSeg和EDUBSeg-Desc基准数据集上实现了稳健的时间分割,性能优于现有技术。在两个人体运动分割基准数据集上的额外实验证明了所提出的DGE的泛化能力。