• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从语音中识别发音动作以实现鲁棒的语音识别。

Recognizing articulatory gestures from speech for robust speech recognition.

机构信息

Speech Technology and Research Laboratory, SRI International, Menlo Park, California 94025, USA.

出版信息

J Acoust Soc Am. 2012 Mar;131(3):2270-87. doi: 10.1121/1.3682038.

DOI:10.1121/1.3682038
PMID:22423722
Abstract

Studies have shown that supplementary articulatory information can help to improve the recognition rate of automatic speech recognition systems. Unfortunately, articulatory information is not directly observable, necessitating its estimation from the speech signal. This study describes a system that recognizes articulatory gestures from speech, and uses the recognized gestures in a speech recognition system. Recognizing gestures for a given utterance involves recovering the set of underlying gestural activations and their associated dynamic parameters. This paper proposes a neural network architecture for recognizing articulatory gestures from speech and presents ways to incorporate articulatory gestures for a digit recognition task. The lack of natural speech database containing gestural information prompted us to use three stages of evaluation. First, the proposed gestural annotation architecture was tested on a synthetic speech dataset, which showed that the use of estimated tract-variable-time-functions improved gesture recognition performance. In the second stage, gesture-recognition models were applied to natural speech waveforms and word recognition experiments revealed that the recognized gestures can improve the noise-robustness of a word recognition system. In the final stage, a gesture-based Dynamic Bayesian Network was trained and the results indicate that incorporating gestural information can improve word recognition performance compared to acoustic-only systems.

摘要

研究表明,补充发音信息可以帮助提高自动语音识别系统的识别率。然而,发音信息无法直接观察,因此需要从语音信号中进行估计。本研究描述了一种从语音中识别发音动作的系统,并在语音识别系统中使用所识别的动作。为给定的话语识别动作涉及恢复潜在的动作激活集及其相关的动态参数。本文提出了一种从语音中识别发音动作的神经网络架构,并提出了将发音动作纳入数字识别任务的方法。由于缺乏包含手势信息的自然语音数据库,我们采用了三个阶段的评估。首先,在合成语音数据集上测试了所提出的手势标注架构,结果表明使用估计的声道变量时间函数可以提高手势识别性能。在第二阶段,将手势识别模型应用于自然语音波形,词识别实验表明,所识别的手势可以提高词识别系统对噪声的鲁棒性。在最后阶段,训练了基于手势的动态贝叶斯网络,结果表明与仅基于声学的系统相比,结合手势信息可以提高词识别性能。

相似文献

1
Recognizing articulatory gestures from speech for robust speech recognition.从语音中识别发音动作以实现鲁棒的语音识别。
J Acoust Soc Am. 2012 Mar;131(3):2270-87. doi: 10.1121/1.3682038.
2
A comparison of automatic and human speech recognition in null grammar.自动语音识别与零语法下的人工语音识别比较。
J Acoust Soc Am. 2012 Mar;131(3):EL256-61. doi: 10.1121/1.3684744.
3
A modular architecture for articulatory synthesis from gestural specification.基于运动学规范的发音合成的模块化架构。
J Acoust Soc Am. 2019 Dec;146(6):4458. doi: 10.1121/1.5139413.
4
The role of iconic gestures in speech disambiguation: ERP evidence.标志性手势在言语歧义消除中的作用:事件相关电位证据。
J Cogn Neurosci. 2007 Jul;19(7):1175-92. doi: 10.1162/jocn.2007.19.7.1175.
5
Gestural recovery and the role of forward and reversed syllabic repetitions as stuttering inhibitors in adults.手势恢复以及正向和反向音节重复作为成人口吃抑制因素的作用。
Neurosci Lett. 2004 Jun 10;363(2):144-9. doi: 10.1016/j.neulet.2004.03.060.
6
Some notes on syllable structure in articulatory phonology.发音音系学中音节结构的一些笔记。
Phonetica. 1988;45(2-4):140-55. doi: 10.1159/000261823.
7
Integration of iconic gestures and speech in left superior temporal areas boosts speech comprehension under adverse listening conditions.左颞上区域中标志性手势和言语的整合在不利的听力条件下增强了言语理解能力。
Neuroimage. 2010 Jan 1;49(1):875-84. doi: 10.1016/j.neuroimage.2009.08.058. Epub 2009 Sep 4.
8
The benefit of gestures during communication: evidence from hearing and hearing-impaired individuals.手势在沟通中的益处:来自听力正常和听力受损个体的证据。
Cortex. 2012 Jul;48(7):857-70. doi: 10.1016/j.cortex.2011.02.007. Epub 2011 Feb 13.
9
Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes.语音固有变化对人类和自动语音音位识别的影响。
J Acoust Soc Am. 2011 Jan;129(1):388-403. doi: 10.1121/1.3514525.
10
fMRI and acoustic analyses reveal neural correlates of gestural complexity and articulatory effort within bilateral inferior frontal gyrus during speech production.fMRI 和声学分析揭示了言语产生过程中双侧下额叶中手势复杂性和发音努力的神经相关性。
Neuropsychologia. 2019 Sep;132:107129. doi: 10.1016/j.neuropsychologia.2019.107129. Epub 2019 Jun 22.

引用本文的文献

1
The Mason-Alberta Phonetic Segmenter: a forced alignment system based on deep neural networks and interpolation.梅森-阿尔伯塔音标分段器:一种基于深度神经网络和插值的强制对齐系统。
Phonetica. 2024 Sep 5;81(5):451-508. doi: 10.1515/phon-2024-0015. Print 2024 Oct 28.
2
The Type of Noise Influences Quality Ratings for Noisy Speech in Hearing Aid Users.噪声类型会影响助听器使用者对噪声环境下言语的质量评级。
J Speech Lang Hear Res. 2020 Dec 14;63(12):4300-4313. doi: 10.1044/2020_JSLHR-20-00156. Epub 2020 Nov 30.
3
The Role of Temporal Modulation in Sensorimotor Interaction.
时间调制在感觉运动交互中的作用。
Front Psychol. 2019 Dec 6;10:2608. doi: 10.3389/fpsyg.2019.02608. eCollection 2019.
4
Methods for eliciting, annotating, and analyzing databases for child speech development.用于引发、注释和分析儿童语言发展数据库的方法。
Comput Speech Lang. 2017 Sep;45:278-299. doi: 10.1016/j.csl.2017.02.010.
5
Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories.直接从数据中得出的类似发音手势的表征保留了有关音素类别的辨别信息。
Comput Speech Lang. 2016 Mar 1;36:330-346. doi: 10.1016/j.csl.2015.03.004. Epub 2015 Mar 21.
6
Spatio-temporal articulatory movement primitives during speech production: extraction, interpretation, and validation.言语产生过程中的时空发音运动基元:提取、解释和验证。
J Acoust Soc Am. 2013 Aug;134(2):1378-94. doi: 10.1121/1.4812765.
7
Modeling speech imitation and ecological learning of auditory-motor maps.建模听觉-运动图谱的言语模仿和生态学习。
Front Psychol. 2013 Jun 27;4:364. doi: 10.3389/fpsyg.2013.00364. Print 2013.