Alon Jonathan, Athitsos Vassilis, Yuan Quan, Sclaroff Stan
Computer Science Department, Boston University, Boston, MA 02215, USA.
IEEE Trans Pattern Anal Mach Intell. 2009 Sep;31(9):1685-99. doi: 10.1109/TPAMI.2008.203.
Within the context of hand gesture recognition, spatiotemporal gesture segmentation is the task of determining, in a video sequence, where the gesturing hand is located and when the gesture starts and ends. Existing gesture recognition methods typically assume either known spatial segmentation or known temporal segmentation, or both. This paper introduces a unified framework for simultaneously performing spatial segmentation, temporal segmentation, and recognition. In the proposed framework, information flows both bottom-up and top-down. A gesture can be recognized even when the hand location is highly ambiguous and when information about when the gesture begins and ends is unavailable. Thus, the method can be applied to continuous image streams where gestures are performed in front of moving, cluttered backgrounds. The proposed method consists of three novel contributions: a spatiotemporal matching algorithm that can accommodate multiple candidate hand detections in every frame, a classifier-based pruning framework that enables accurate and early rejection of poor matches to gesture models, and a subgesture reasoning algorithm that learns which gesture models can falsely match parts of other longer gestures. The performance of the approach is evaluated on two challenging applications: recognition of hand-signed digits gestured by users wearing short-sleeved shirts, in front of a cluttered background, and retrieval of occurrences of signs of interest in a video database containing continuous, unsegmented signing in American Sign Language (ASL).
在手势识别的背景下,时空手势分割是在视频序列中确定做出手势的手的位置以及手势何时开始和结束的任务。现有的手势识别方法通常假设已知空间分割或已知时间分割,或者两者都已知。本文介绍了一个用于同时执行空间分割、时间分割和识别的统一框架。在所提出的框架中,信息自下而上和自上而下流动。即使手的位置高度模糊且手势开始和结束的信息不可用时,也可以识别出手势。因此,该方法可以应用于在移动的、杂乱的背景前执行手势的连续图像流。所提出的方法包括三个新颖的贡献:一种时空匹配算法,该算法可以在每一帧中容纳多个候选手部检测;一个基于分类器的剪枝框架,该框架能够准确且早期拒绝与手势模型的不良匹配;以及一个子手势推理算法,该算法学习哪些手势模型可能错误地匹配其他更长手势的部分。该方法的性能在两个具有挑战性的应用中进行了评估:识别穿着短袖衬衫的用户在杂乱背景前做出的手语数字,以及在包含美国手语(ASL)连续、未分割手语的视频数据库中检索感兴趣的手语出现情况。