Department of Intensive Care and Intermediate Care, University Hospital RWTH Aachen, Pauwelsstreet 30, 52072, Aachen, Germany.
Research Area Information Theory and Systematic Design of Communication Systems, RWTH Aachen University, Kopernikusstreet 16, 52074, Aachen, Germany.
Sci Rep. 2023 Jan 17;13(1):928. doi: 10.1038/s41598-022-26155-5.
In this work, we propose a framework to enhance the communication abilities of speech-impaired patients in an intensive care setting via reading lips. Medical procedure, such as a tracheotomy, causes the patient to lose the ability to utter speech with little to no impact on the habitual lip movement. Consequently, we developed a framework to predict the silently spoken text by performing visual speech recognition, i.e., lip-reading. In a two-stage architecture, frames of the patient's face are used to infer audio features as an intermediate prediction target, which are then used to predict the uttered text. To the best of our knowledge, this is the first approach to bring visual speech recognition into an intensive care setting. For this purpose, we recorded an audio-visual dataset in the University Hospital of Aachen's intensive care unit (ICU) with a language corpus hand-picked by experienced clinicians to be representative of their day-to-day routine. With a word error rate of 6.3%, the trained system reaches a sufficient overall performance to significantly increase the quality of communication between patient and clinician or relatives.
在这项工作中,我们提出了一个框架,通过读取嘴唇来增强重症监护环境中言语障碍患者的沟通能力。医疗程序,如气管切开术,会导致患者失去说话的能力,但对习惯性嘴唇运动的影响很小。因此,我们开发了一个通过执行视觉语音识别(即唇读)来预测无声说话文本的框架。在两级架构中,使用患者面部的帧来推断音频特征作为中间预测目标,然后使用这些特征来预测说出的文本。据我们所知,这是将视觉语音识别引入重症监护环境的首次尝试。为此,我们在亚琛大学医院的重症监护病房(ICU)记录了一个视听数据集,该数据集的语言语料库是由经验丰富的临床医生精心挑选的,以代表他们的日常工作。经过训练的系统的单词错误率为 6.3%,达到了足够的整体性能,可以显著提高患者与临床医生或家属之间的沟通质量。