• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于MobileNet+注意力门控循环单元的唇语识别算法研究

Research on lip recognition algorithm based on MobileNet + attention-GRU.

作者信息

Lu Yuanyao, Li Kexin

机构信息

School of Information, North China University of Technology, Beijing 100144, China.

出版信息

Math Biosci Eng. 2022 Sep 15;19(12):13526-13540. doi: 10.3934/mbe.2022631.

DOI:10.3934/mbe.2022631
PMID:36654056
Abstract

With the development of deep learning and artificial intelligence, the application of lip recognition is in high demand in computer vision and human-machine interaction. Especially, utilizing automatic lip recognition technology to improve performance during social interactions for those hard of hearing, and pronunciation is one of the most promising applications of artificial intelligence in medical healthcare and rehabilitation. Lip recognition means to recognize the content expressed by the speaker by analyzing dynamic motions. Presently, lip recognition research mainly focuses on the algorithms and computational performance, but there are relatively few research articles on its practical application. In order to amend that, this paper focuses on the research of a deep learning-based lip recognition application system, i.e., the design and development of a speech correction system for the hearing impaired, which aims to lay the foundation for the comprehensive implementation of automatic lip recognition technology in the future. First, we used a MobileNet lightweight network to extract spatial features from the original lip image; the extracted features are robust and fault-tolerant. Then, the gated recurrent unit (GRU) network was used to further extract the 2D image features and temporal features of the lip. To further improve the recognition rate, based on the GRU network, we incorporated an attention mechanism; the performance of this model is illustrated through a large number of experiments. Meanwhile, we constructed a lip similarity matching system to assist hearing-impaired people in learning and correcting their mouth shape with correct pronunciation. The experiments finally show that this system is highly feasible and effective.

摘要

随着深度学习和人工智能的发展,唇语识别在计算机视觉和人机交互领域的应用需求很高。特别是,利用自动唇语识别技术来提高听力障碍者在社交互动中的表现以及发音,是人工智能在医疗保健和康复领域最有前景的应用之一。唇语识别是指通过分析动态动作来识别说话者表达的内容。目前,唇语识别研究主要集中在算法和计算性能方面,但关于其实际应用的研究文章相对较少。为了弥补这一不足,本文重点研究基于深度学习的唇语识别应用系统,即针对听力障碍者的语音矫正系统的设计与开发,旨在为未来自动唇语识别技术的全面应用奠定基础。首先,我们使用MobileNet轻量级网络从原始唇图像中提取空间特征;提取的特征具有鲁棒性和容错性。然后,使用门控循环单元(GRU)网络进一步提取唇部的二维图像特征和时间特征。为了进一步提高识别率,基于GRU网络,我们引入了注意力机制;通过大量实验说明了该模型的性能。同时,我们构建了一个唇相似度匹配系统,以帮助听力障碍者学习并纠正他们的口型以发出正确的发音。实验最终表明该系统具有高度的可行性和有效性。

相似文献

1
Research on lip recognition algorithm based on MobileNet + attention-GRU.基于MobileNet+注意力门控循环单元的唇语识别算法研究
Math Biosci Eng. 2022 Sep 15;19(12):13526-13540. doi: 10.3934/mbe.2022631.
2
Discriminative analysis of lip motion features for speaker identification and speech-reading.用于说话人识别和语音阅读的唇部运动特征判别分析。
IEEE Trans Image Process. 2006 Oct;15(10):2879-91. doi: 10.1109/tip.2006.877528.
3
Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework.基于级联双注意力卷积神经网络和双向门控循环单元框架的人类活动识别
J Imaging. 2023 Jun 26;9(7):130. doi: 10.3390/jimaging9070130.
4
Motion prediction using brain waves based on artificial intelligence deep learning recurrent neural network.基于人工智能深度学习递归神经网络的脑电波运动预测
J Exerc Rehabil. 2023 Aug 22;19(4):219-227. doi: 10.12965/jer.2346242.121. eCollection 2023 Aug.
5
[Development and evaluation of a deep learning algorithm for German word recognition from lip movements].[一种用于从唇动识别德语单词的深度学习算法的开发与评估]
HNO. 2022 Jun;70(6):456-465. doi: 10.1007/s00106-021-01143-9. Epub 2022 Jan 13.
6
Object recognition in medical images via anatomy-guided deep learning.通过解剖学引导的深度学习实现医学图像中的目标识别。
Med Image Anal. 2022 Oct;81:102527. doi: 10.1016/j.media.2022.102527. Epub 2022 Jun 25.
7
Music Score Recognition Method Based on Deep Learning.基于深度学习的乐谱识别方法
Comput Intell Neurosci. 2022 Jul 7;2022:3022767. doi: 10.1155/2022/3022767. eCollection 2022.
8
End-to-End Lip-Reading Open Cloud-Based Speech Architecture.端到端唇读开放云语音架构。
Sensors (Basel). 2022 Apr 12;22(8):2938. doi: 10.3390/s22082938.
9
Analysis Model of Spoken English Evaluation Algorithm Based on Intelligent Algorithm of Internet of Things.基于物联网智能算法的英语口语评估算法分析模型。
Comput Intell Neurosci. 2022 Mar 27;2022:8469945. doi: 10.1155/2022/8469945. eCollection 2022.
10
A Spatial-Temporal Graph Model for Pronunciation Feature Prediction of Chinese Poetry.基于时空图模型的中文诗歌发音特征预测
IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):10294-10308. doi: 10.1109/TNNLS.2022.3165554. Epub 2023 Nov 30.