用于未来听力设备语音识别的人工智能智能口罩。

Artificial intelligence enabled smart mask for speech recognition for future hearing devices.

作者信息

Hameed Hira, Usman Muhammad, Kazim Jalil Ur Rehman, Assaleh Khaled, Arshad Kamran, Hussain Amir, Imran Muhammad, Abbasi Qammer H

机构信息

James Watt School of Engineering, University of Glasgow, Glasgow, G12 8QQ, UK.

University of Engineering & Technology, UETP, Peshawar, Pakistan.

出版信息

Sci Rep. 2024 Dec 3;14(1):30112. doi: 10.1038/s41598-024-81904-y.

DOI:10.1038/s41598-024-81904-y

PMID:39627338

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11614889/

Abstract

In recent years, Lip-reading has emerged as a significant research challenge. The aim is to recognise speech by analysing Lip movements. The majority of Lip-reading technologies are based on cameras and wearable devices. However, these technologies have well-known occlusion and ambient lighting limitations, privacy concerns as well as wearable device discomfort for subjects and disturb their daily routines. Furthermore, in the era of coronavirus (COVID-19), where face masks are the norm, vision-based and wearable-based technologies for hearing aids are ineffective. To address the fundamental limitations of camera-based and wearable-based systems, this paper proposes a Radio Frequency Identification (RFID)-based smart mask for a Lip-reading framework capable of reading Lips under face masks, enabling effective speech recognition and fostering conversational accessibility for individuals with hearing impairment. The system uses RFID technology to make Radio Frequency (RF) sensing-based Lip-reading possible. A smart RFID face mask is used to collect a dataset containing three different classes of vowels (A, E, I, O, U), Consonants (F, G, M, S), and words (Fish, Goat, Meal, Moon, Snake). The collected data are fed into well-known machine-learning models for classification. A high classification accuracy is achieved by individual classes and combined datasets. On the RFID combined dataset, the Random Forest model achieves a high classification accuracy of 80%.

摘要

近年来，唇读已成为一项重大的研究挑战。其目的是通过分析唇部动作来识别语音。大多数唇读技术基于摄像头和可穿戴设备。然而，这些技术存在众所周知的遮挡和环境光照限制、隐私问题，以及可穿戴设备给受试者带来的不适并干扰其日常生活。此外，在冠状病毒病（COVID-19）时代，戴口罩成为常态，基于视觉和可穿戴设备的助听器技术无效。为解决基于摄像头和可穿戴设备系统的根本局限性，本文提出一种基于射频识别（RFID）的智能口罩，用于唇读框架，该框架能够在口罩下读取唇部动作，实现有效的语音识别，并促进听力障碍者的对话便利性。该系统利用RFID技术使基于射频（RF）传感的唇读成为可能。一个智能RFID口罩用于收集包含三类不同元音（A、E、I、O、U）、辅音（F、G、M、S）和单词（Fish、Goat、Meal、Moon、Snake）的数据集。收集到的数据被输入到著名的机器学习模型中进行分类。单个类别和组合数据集均实现了较高的分类准确率。在RFID组合数据集上，随机森林模型实现了80%的高分类准确率。