Guiard-Marigny T, Ostry D J
Institut de la Communication Parlée, Grenoble, France.
J Speech Lang Hear Res. 1997 Oct;40(5):1118-21. doi: 10.1044/jslhr.4005.1118.
With the development of precise three-dimensional motion measurement systems and powerful computers for three-dimensional graphical visualization, it is possible to record and fully reconstruct human jaw motion. In this paper, we describe a visualization system for displaying three-dimensional jaw movements in speech. The system is designed to take as input jaw motion data obtained from one or multi-dimensional recording systems. In the present application, kinematic records of jaw motion were recorded using an optoelectronic measurement system (Optotrak). The corresponding speech signal was recorded using an analog input channel. The three orientation angles and three positions that describe the motion of the jaw as a rigid skeletal structure were derived from the empirical measurements. These six kinematic variables, which in mechanical terms account fully for jaw motion kinematics, act as inputs that drive a real-time three-dimensional animation of a skeletal jaw and upper skull. The visualization software enables the user to view jaw motion from any orientation and to change the viewpoint during the course of an utterance. Selected portions of an utterance may be replayed and the speed of the visual display may be varied. The user may also display, along with the audio track, individual kinematic degrees of freedom or several degrees of freedom in combination. The system is presently being used as an educational tool and for research into audio-visual speech recognition. Interested researchers may obtain the software and source code free of charge from the authors.
随着精确的三维运动测量系统以及用于三维图形可视化的强大计算机的发展,记录并完整重建人类下颌运动成为可能。在本文中,我们描述了一种用于显示言语中三维下颌运动的可视化系统。该系统被设计为将从一维或多维记录系统获得的下颌运动数据作为输入。在当前应用中,使用光电测量系统(Optotrak)记录下颌运动的运动学记录。使用模拟输入通道记录相应的语音信号。从实证测量中得出描述作为刚性骨骼结构的下颌运动的三个方向角和三个位置。这六个运动学变量,从机械角度完全说明了下颌运动学,作为驱动骨骼下颌和上颌实时三维动画的输入。可视化软件使用户能够从任何方向查看下颌运动,并在发声过程中改变视角。可以重放话语的选定部分,并且视觉显示的速度可以变化。用户还可以与音轨一起显示单个运动学自由度或几个自由度的组合。该系统目前正被用作教育工具以及用于视听语音识别研究。感兴趣的研究人员可以从作者处免费获得该软件和源代码。