University of South Florida, Tampa, FL, USA.
IEEE Trans Pattern Anal Mach Intell. 2010 Mar;32(3):462-77. doi: 10.1109/TPAMI.2009.26.
We consider two crucial problems in continuous sign language recognition from unaided video sequences. At the sentence level, we consider the movement epenthesis (me) problem and at the feature level, we consider the problem of hand segmentation and grouping. We construct a framework that can handle both of these problems based on an enhanced, nested version of the dynamic programming approach. To address movement epenthesis, a dynamic programming (DP) process employs a virtual me option that does not need explicit models. We call this the enhanced level building (eLB) algorithm. This formulation also allows the incorporation of grammar models. Nested within this eLB is another DP that handles the problem of selecting among multiple hand candidates. We demonstrate our ideas on four American Sign Language data sets with simple background, with the signer wearing short sleeves, with complex background, and across signers. We compared the performance with Conditional Random Fields (CRF) and Latent Dynamic-CRF-based approaches. The experiments show more than 40 percent improvement over CRF or LDCRF approaches in terms of the frame labeling rate. We show the flexibility of our approach when handling a changing context. We also find a 70 percent improvement in sign recognition rate over the unenhanced DP matching algorithm that does not accommodate the me effect.
我们考虑了连续手语识别中两个关键问题。在句子层面,我们考虑运动插入(ME)问题,在特征层面,我们考虑手分割和分组问题。我们构建了一个基于增强嵌套动态规划方法的框架来处理这两个问题。为了解决运动插入问题,动态规划(DP)过程采用了一种不需要显式模型的虚拟 ME 选项。我们称之为增强层构建(eLB)算法。这种公式还允许合并语法模型。嵌套在这个 eLB 中的是另一个 DP,用于在多个手候选者中进行选择。我们在四个美国手语数据集上展示了我们的想法,这些数据集的背景简单,签名者穿着短袖,背景复杂,以及跨签名者。我们将性能与条件随机场(CRF)和基于潜在动态 CRF 的方法进行了比较。实验表明,在帧标记率方面,与 CRF 或 LDCRF 方法相比,我们的方法提高了 40%以上。我们展示了我们的方法在处理变化的上下文时的灵活性。我们还发现,与不适应 ME 效应的未增强 DP 匹配算法相比,签名识别率提高了 70%。