University of Glasgow, James Watt School of Engineering, Glasgow, G12 8QQ, UK.
School of Computing, Engineering and Built Environment, Glasgow Caledonian University, Glasgow, G4 0BA, UK.
Nat Commun. 2022 Sep 7;13(1):5168. doi: 10.1038/s41467-022-32231-1.
The problem of Lip-reading has become an important research challenge in recent years. The goal is to recognise speech from lip movements. Most of the Lip-reading technologies developed so far are camera-based, which require video recording of the target. However, these technologies have well-known limitations of occlusion and ambient lighting with serious privacy concerns. Furthermore, vision-based technologies are not useful for multi-modal hearing aids in the coronavirus (COVID-19) environment, where face masks have become a norm. This paper aims to solve the fundamental limitations of camera-based systems by proposing a radio frequency (RF) based Lip-reading framework, having an ability to read lips under face masks. The framework employs Wi-Fi and radar technologies as enablers of RF sensing based Lip-reading. A dataset comprising of vowels A, E, I, O, U and empty (static/closed lips) is collected using both technologies, with a face mask. The collected data is used to train machine learning (ML) and deep learning (DL) models. A high classification accuracy of 95% is achieved on the Wi-Fi data utilising neural network (NN) models. Moreover, similar accuracy is achieved by VGG16 deep learning model on the collected radar-based dataset.
唇语识别问题近年来成为一个重要的研究挑战。目标是从唇动中识别语音。到目前为止,大多数开发的唇语识别技术都是基于摄像机的,这需要对目标进行视频录制。然而,这些技术存在众所周知的遮挡和环境光照限制,并且存在严重的隐私问题。此外,基于视觉的技术对于冠状病毒 (COVID-19) 环境中的多模态助听器没有用处,因为口罩已经成为常态。本文旨在通过提出一种基于射频 (RF) 的唇语识别框架来解决基于摄像机系统的基本限制,该框架具有在戴口罩的情况下读取嘴唇的能力。该框架采用 Wi-Fi 和雷达技术作为基于 RF 感应的唇语识别的推动者。使用这两种技术收集了包含元音 A、E、I、O、U 和空(静态/闭合嘴唇)的数据集,同时使用了口罩。所收集的数据用于训练机器学习 (ML) 和深度学习 (DL) 模型。利用神经网络 (NN) 模型,Wi-Fi 数据的分类准确率达到 95%。此外,基于收集的雷达数据集的 VGG16 深度学习模型也实现了类似的准确率。