通过戴口罩读唇实现远程射频感应的极限突破。

Pushing the limits of remote RF sensing by reading lips under the face mask.

机构信息

University of Glasgow, James Watt School of Engineering, Glasgow, G12 8QQ, UK.

School of Computing, Engineering and Built Environment, Glasgow Caledonian University, Glasgow, G4 0BA, UK.

出版信息

Nat Commun. 2022 Sep 7;13(1):5168. doi: 10.1038/s41467-022-32231-1.

DOI:10.1038/s41467-022-32231-1

PMID:36071056

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9452506/

Abstract

The problem of Lip-reading has become an important research challenge in recent years. The goal is to recognise speech from lip movements. Most of the Lip-reading technologies developed so far are camera-based, which require video recording of the target. However, these technologies have well-known limitations of occlusion and ambient lighting with serious privacy concerns. Furthermore, vision-based technologies are not useful for multi-modal hearing aids in the coronavirus (COVID-19) environment, where face masks have become a norm. This paper aims to solve the fundamental limitations of camera-based systems by proposing a radio frequency (RF) based Lip-reading framework, having an ability to read lips under face masks. The framework employs Wi-Fi and radar technologies as enablers of RF sensing based Lip-reading. A dataset comprising of vowels A, E, I, O, U and empty (static/closed lips) is collected using both technologies, with a face mask. The collected data is used to train machine learning (ML) and deep learning (DL) models. A high classification accuracy of 95% is achieved on the Wi-Fi data utilising neural network (NN) models. Moreover, similar accuracy is achieved by VGG16 deep learning model on the collected radar-based dataset.

摘要

唇语识别问题近年来成为一个重要的研究挑战。目标是从唇动中识别语音。到目前为止，大多数开发的唇语识别技术都是基于摄像机的，这需要对目标进行视频录制。然而，这些技术存在众所周知的遮挡和环境光照限制，并且存在严重的隐私问题。此外，基于视觉的技术对于冠状病毒 (COVID-19) 环境中的多模态助听器没有用处，因为口罩已经成为常态。本文旨在通过提出一种基于射频 (RF) 的唇语识别框架来解决基于摄像机系统的基本限制，该框架具有在戴口罩的情况下读取嘴唇的能力。该框架采用 Wi-Fi 和雷达技术作为基于 RF 感应的唇语识别的推动者。使用这两种技术收集了包含元音 A、E、I、O、U 和空（静态/闭合嘴唇）的数据集，同时使用了口罩。所收集的数据用于训练机器学习 (ML) 和深度学习 (DL) 模型。利用神经网络 (NN) 模型，Wi-Fi 数据的分类准确率达到 95%。此外，基于收集的雷达数据集的 VGG16 深度学习模型也实现了类似的准确率。