Foulds Richard A
Department of Biomedical Engineering, New Jersey Institute of Technology, Newark, NJ 07102-1982, USA.
IEEE Trans Neural Syst Rehabil Eng. 2004 Mar;12(1):65-72. doi: 10.1109/TNSRE.2003.821371.
Access to telecommunication systems by deaf users of sign language can be greatly enhanced with the incorporation of video conferencing in addition to text-based adaptations. However, the communication channel bandwidth is often challenged by the spatial requirements to represent the image in each frame and temporal demands to preserve the movement trajectory with a sufficiently high frame rate. Effective systems must balance the portion of a limited channel bandwidth devoted to the quality of the individual frames and the frame rate in order to meet their intended needs. Conventional video conferencing technology generally addresses the limitations of channel capacity by drastically reducing the frame rate, while preserving image quality. This produces a jerky image that disturbs the trajectories of the hands and arms, which are essential in sign language. In contrast, a sign language communication system must provide a frame rate that is capable of representing the kinematic bandwidth of human movement. Prototype sign language communication systems often attempt to maintain a high frame rate by reducing the quality of the image with lossy spatial compression. Unfortunately, this still requires a combined spatial and temporal data rate, which exceeds the limited channel of residential and wireless telephony. While spatial compression techniques have been effective in reducing the data, there has been no comparable compression of sign language in the temporal domain. Even modest reductions in the frame rate introduce perceptually disturbing flicker that decreases intelligibility. This paper introduces a method through which temporal compression on the order of 5:1 can be achieved. This is accomplished by decoupling the biomechanical or kinematic bandwidth necessary to represent continuous movements in sign language from the perceptually determined critical flicker frequency.
除了基于文本的适配之外,通过纳入视频会议,使用手语的聋人用户对电信系统的访问能力可得到极大提升。然而,通信信道带宽常常受到在每一帧中呈现图像的空间要求以及以足够高的帧率保留运动轨迹的时间要求的挑战。有效的系统必须在分配给各个帧质量的有限信道带宽部分与帧率之间取得平衡,以满足其预期需求。传统的视频会议技术通常通过大幅降低帧率同时保持图像质量来解决信道容量的限制问题。这会产生一种抖动的图像,干扰了手语中至关重要的手部和手臂的运动轨迹。相比之下,手语通信系统必须提供能够表示人类运动的运动带宽的帧率。原型手语通信系统常常试图通过有损空间压缩降低图像质量来维持高帧率。不幸的是,这仍然需要一个超出住宅和无线电话有限信道的时空数据速率组合。虽然空间压缩技术在减少数据方面很有效,但在时间域中还没有对手语进行类似的压缩。即使帧率有适度降低也会引入在感知上令人不安的闪烁,从而降低可懂度。本文介绍了一种能够实现大约5:1的时间压缩的方法。这是通过将表示手语中连续运动所需的生物力学或运动带宽与感知确定的临界闪烁频率解耦来实现的。