IEEE Trans Pattern Anal Mach Intell. 2016 Aug;38(8):1640-50. doi: 10.1109/TPAMI.2015.2481404. Epub 2015 Sep 23.
Automatic behavior analysis from video is a major topic in many areas of research, including computer vision, multimedia, robotics, biology, cognitive science, social psychology, psychiatry, and linguistics. Two major problems are of interest when analyzing behavior. First, we wish to automatically categorize observed behaviors into a discrete set of classes (i.e., classification). For example, to determine word production from video sequences in sign language. Second, we wish to understand the relevance of each behavioral feature in achieving this classification (i.e., decoding). For instance, to know which behavior variables are used to discriminate between the words apple and onion in American Sign Language (ASL). The present paper proposes to model behavior using a labeled graph, where the nodes define behavioral features and the edges are labels specifying their order (e.g., before, overlaps, start). In this approach, classification reduces to a simple labeled graph matching. Unfortunately, the complexity of labeled graph matching grows exponentially with the number of categories we wish to represent. Here, we derive a graph kernel to quickly and accurately compute this graph similarity. This approach is very general and can be plugged into any kernel-based classifier. Specifically, we derive a Labeled Graph Support Vector Machine (LGSVM) and a Labeled Graph Logistic Regressor (LGLR) that can be readily employed to discriminate between many actions (e.g., sign language concepts). The derived approach can be readily used for decoding too, yielding invaluable information for the understanding of a problem (e.g., to know how to teach a sign language). The derived algorithms allow us to achieve higher accuracy results than those of state-of-the-art algorithms in a fraction of the time. We show experimental results on a variety of problems and datasets, including multimodal data.
自动行为分析是计算机视觉、多媒体、机器人、生物、认知科学、社会心理学、精神病学和语言学等许多研究领域的一个主要课题。在分析行为时,有两个主要问题引起了人们的兴趣。首先,我们希望能够自动将观察到的行为分类到离散的类别中(即分类)。例如,从手语视频序列中确定单词的生成。其次,我们希望了解在实现这种分类中每个行为特征的相关性(即解码)。例如,了解在美式手语 (ASL) 中,哪些行为变量被用于区分苹果和洋葱这两个词。本文提出了一种使用标记图来建模行为的方法,其中节点定义行为特征,边则指定其顺序的标签(例如,之前、重叠、开始)。在这种方法中,分类简化为简单的标记图匹配。不幸的是,标记图匹配的复杂度随着我们希望表示的类别数量呈指数增长。在这里,我们推导出一种图核以快速准确地计算这种图相似性。这种方法非常通用,可以插入到任何基于核的分类器中。具体来说,我们推导出一个标记图支持向量机 (LGSVM) 和一个标记图逻辑回归 (LGLR),它们可以很容易地用于区分许多动作(例如,手语概念)。所得到的方法也可以很容易地用于解码,为理解问题提供宝贵的信息(例如,知道如何教授手语)。所得到的算法允许我们在一小部分时间内获得比最先进算法更高的准确性结果。我们在各种问题和数据集上展示了实验结果,包括多模态数据。