Ding Liya, Martinez Aleix M
Dept. of Electrical and Computer Engineering, The Ohio State University, Columbus, OH 43210.
Image Vis Comput. 2009 Nov 1;27(12):1826-1844. doi: 10.1016/j.imavis.2009.02.005.
The manual signs in sign languages are generated and interpreted using three basic building blocks: handshape, motion, and place of articulation. When combined, these three components (together with palm orientation) uniquely determine the meaning of the manual sign. This means that the use of pattern recognition techniques that only employ a subset of these components is inappropriate for interpreting the sign or to build automatic recognizers of the language. In this paper, we define an algorithm to model these three basic components form a single video sequence of two-dimensional pictures of a sign. Recognition of these three components are then combined to determine the class of the signs in the videos. Experiments are performed on a database of (isolated) American Sign Language (ASL) signs. The results demonstrate that, using semi-automatic detection, all three components can be reliably recovered from two-dimensional video sequences, allowing for an accurate representation and recognition of the signs.
手型、动作和发音位置。这三个要素(连同手掌方向)结合在一起时,唯一地决定了手势的含义。这意味着仅使用这些要素的一个子集的模式识别技术不适用于解读手势或构建该语言的自动识别器。在本文中,我们定义了一种算法,用于从手势的二维图片的单个视频序列中对这三个基本要素进行建模。然后将对这三个要素的识别结合起来,以确定视频中手势的类别。我们在一个(孤立的)美国手语(ASL)手势数据库上进行了实验。结果表明,使用半自动检测,可以从二维视频序列中可靠地恢复所有三个要素,从而实现对手势的准确表示和识别。