• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

建模听觉-运动图谱的言语模仿和生态学习。

Modeling speech imitation and ecological learning of auditory-motor maps.

机构信息

Mirror Neurons and Interaction Lab, Robotics, Brain and Cognitive Sciences Department, Istituto Italiano di Tecnologia Genova, Italy.

出版信息

Front Psychol. 2013 Jun 27;4:364. doi: 10.3389/fpsyg.2013.00364. Print 2013.

DOI:10.3389/fpsyg.2013.00364
PMID:23818883
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3694210/
Abstract

Classical models of speech consider an antero-posterior distinction between perceptive and productive functions. However, the selective alteration of neural activity in speech motor centers, via transcranial magnetic stimulation, was shown to affect speech discrimination. On the automatic speech recognition (ASR) side, the recognition systems have classically relied solely on acoustic data, achieving rather good performance in optimal listening conditions. The main limitations of current ASR are mainly evident in the realistic use of such systems. These limitations can be partly reduced by using normalization strategies that minimize inter-speaker variability by either explicitly removing speakers' peculiarities or adapting different speakers to a reference model. In this paper we aim at modeling a motor-based imitation learning mechanism in ASR. We tested the utility of a speaker normalization strategy that uses motor representations of speech and compare it with strategies that ignore the motor domain. Specifically, we first trained a regressor through state-of-the-art machine learning techniques to build an auditory-motor mapping, in a sense mimicking a human learner that tries to reproduce utterances produced by other speakers. This auditory-motor mapping maps the speech acoustics of a speaker into the motor plans of a reference speaker. Since, during recognition, only speech acoustics are available, the mapping is necessary to "recover" motor information. Subsequently, in a phone classification task, we tested the system on either one of the speakers that was used during training or a new one. Results show that in both cases the motor-based speaker normalization strategy slightly but significantly outperforms all other strategies where only acoustics is taken into account.

摘要

传统的语音模型将感知和生成功能区分在前后两个部分。然而,经颅磁刺激对言语运动中心的神经活动进行选择性改变,结果表明它会影响言语辨别。在自动语音识别 (ASR) 方面,识别系统传统上仅依赖声学数据,在最佳听力条件下取得了相当好的性能。当前 ASR 的主要局限性主要体现在这些系统的实际使用中。通过使用标准化策略可以部分减少这些局限性,这些策略通过显式去除说话者的特征或使不同的说话者适应参考模型来最小化说话者之间的可变性。在本文中,我们旨在为 ASR 中建模一种基于运动的模仿学习机制。我们测试了一种使用语音运动表示的说话者标准化策略的效用,并将其与忽略运动域的策略进行了比较。具体来说,我们首先通过最先进的机器学习技术训练回归器来建立听觉-运动映射,在某种意义上模仿了一个试图模仿其他说话者发音的人类学习者。该听觉-运动映射将说话者的语音声学特征映射到参考说话者的运动计划中。由于在识别过程中仅可获得语音声学特征,因此需要该映射来“恢复”运动信息。随后,在电话分类任务中,我们在训练中使用的其中一个说话者或新的说话者上测试了系统。结果表明,在这两种情况下,基于运动的说话者标准化策略都略微但显著地优于仅考虑声学的所有其他策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ba/3694210/9a82496c08fd/fpsyg-04-00364-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ba/3694210/89243c318a0a/fpsyg-04-00364-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ba/3694210/7c2f786a5b56/fpsyg-04-00364-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ba/3694210/9a82496c08fd/fpsyg-04-00364-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ba/3694210/89243c318a0a/fpsyg-04-00364-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ba/3694210/7c2f786a5b56/fpsyg-04-00364-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3ba/3694210/9a82496c08fd/fpsyg-04-00364-g0003.jpg

相似文献

1
Modeling speech imitation and ecological learning of auditory-motor maps.建模听觉-运动图谱的言语模仿和生态学习。
Front Psychol. 2013 Jun 27;4:364. doi: 10.3389/fpsyg.2013.00364. Print 2013.
2
Speaker-Independent Silent Speech Recognition from Flesh-Point Articulatory Movements Using an LSTM Neural Network.基于LSTM神经网络的、利用肤点发音动作的独立说话人无声语音识别
IEEE/ACM Trans Audio Speech Lang Process. 2017 Dec;25(12):2323-2336. doi: 10.1109/TASLP.2017.2758999. Epub 2017 Nov 23.
3
Auditory-motor integration during fast repetition: the neuronal correlates of shadowing.快速重复过程中的听觉-运动整合:跟读的神经关联
Neuroimage. 2009 Aug 1;47(1):392-402. doi: 10.1016/j.neuroimage.2009.03.061. Epub 2009 Apr 1.
4
Acoustic and speaker variation in Dutch /n/ and /m/ as a function of phonetic context and syllabic position.荷兰语中/n/和/m/的声学及发音人变异与语音语境和音节位置的关系
J Acoust Soc Am. 2021 Aug;150(2):979. doi: 10.1121/10.0005845.
5
Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech.基于机器学习的方言阿萨姆语语音自动识别样本提取。
Neural Netw. 2016 Jun;78:97-111. doi: 10.1016/j.neunet.2015.12.010. Epub 2015 Dec 30.
6
Exploring the performance of automatic speaker recognition using twin speech and deep learning-based artificial neural networks.利用双语音和基于深度学习的人工神经网络探索自动说话人识别的性能。
Front Artif Intell. 2024 Feb 8;7:1287877. doi: 10.3389/frai.2024.1287877. eCollection 2024.
7
Speaker Adaptation on Articulation and Acoustics for Articulation-to-Speech Synthesis.发音和声学的说话人自适应在发音语音合成中的应用。
Sensors (Basel). 2022 Aug 13;22(16):6056. doi: 10.3390/s22166056.
8
Computational validation of the motor contribution to speech perception.语音感知中运动贡献的计算验证。
Top Cogn Sci. 2014 Jul;6(3):461-75. doi: 10.1111/tops.12095. Epub 2014 Jun 17.
9
Speaker normalization using cortical strip maps: a neural model for steady-state vowel categorization.使用皮质带图的说话者归一化:一种用于稳态元音分类的神经模型。
J Acoust Soc Am. 2008 Dec;124(6):3918-36. doi: 10.1121/1.2997478.
10
Speaker normalization for chinese vowel recognition in cochlear implants.用于人工耳蜗植入中汉语元音识别的说话人归一化
IEEE Trans Biomed Eng. 2005 Jul;52(7):1358-61. doi: 10.1109/TBME.2005.847530.

引用本文的文献

1
Computer simulations of coupled idiosyncrasies in speech perception and speech production with COSMO, a perceptuo-motor Bayesian model of speech communication.使用 COSMO(一种基于感知运动的言语交际贝叶斯模型)对言语感知和言语产生中的个体差异进行耦合的计算机模拟。
PLoS One. 2019 Jan 11;14(1):e0210302. doi: 10.1371/journal.pone.0210302. eCollection 2019.
2
Modification of spectral features by nonhuman primates.非人灵长类动物对光谱特征的改变。
Behav Brain Sci. 2014 Dec;37(6):574-6; discussion 577-604. doi: 10.1017/S0140525X13004226.
3
Listening to speech recruits specific tongue motor synergies as revealed by transcranial magnetic stimulation and tissue-Doppler ultrasound imaging.

本文引用的文献

1
Mirror systems.镜像系统。
Wiley Interdiscip Rev Cogn Sci. 2011 Jan;2(1):22-38. doi: 10.1002/wcs.89. Epub 2010 Jun 14.
2
Computational validation of the motor contribution to speech perception.语音感知中运动贡献的计算验证。
Top Cogn Sci. 2014 Jul;6(3):461-75. doi: 10.1111/tops.12095. Epub 2014 Jun 17.
3
Recognizing articulatory gestures from speech for robust speech recognition.从语音中识别发音动作以实现鲁棒的语音识别。
经颅磁刺激和组织多普勒超声成像显示,听语音会激发特定的舌部运动协同作用。
Philos Trans R Soc Lond B Biol Sci. 2014 Apr 28;369(1644):20130418. doi: 10.1098/rstb.2013.0418. Print 2014.
J Acoust Soc Am. 2012 Mar;131(3):2270-87. doi: 10.1121/1.3682038.
4
Computational neuroanatomy of speech production.言语产生的计算神经解剖学。
Nat Rev Neurosci. 2012 Jan 5;13(2):135-45. doi: 10.1038/nrn3158.
5
Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion.基于与主体无关的声学-发音反转的发音特征的自动语音识别。
J Acoust Soc Am. 2011 Oct;130(4):EL251-7. doi: 10.1121/1.3634122.
6
The use of phonetic motor invariants can improve automatic phoneme discrimination.利用语音运动不变量可以提高自动音素辨别能力。
PLoS One. 2011;6(9):e24055. doi: 10.1371/journal.pone.0024055. Epub 2011 Sep 1.
7
Vocal pitch discrimination in the motor system.运动系统中的音高辨别。
Brain Lang. 2011 Jul;118(1-2):9-14. doi: 10.1016/j.bandl.2011.02.007. Epub 2011 Mar 31.
8
Action understanding and active inference.动作理解与主动推理。
Biol Cybern. 2011 Feb;104(1-2):137-60. doi: 10.1007/s00422-011-0424-z. Epub 2011 Feb 17.
9
Sensorimotor integration in speech processing: computational basis and neural organization.言语加工中的感觉运动整合:计算基础和神经组织。
Neuron. 2011 Feb 10;69(3):407-22. doi: 10.1016/j.neuron.2011.01.019.
10
Impaired speech repetition and left parietal lobe damage.言语重复障碍与左顶叶损伤。
J Neurosci. 2010 Aug 18;30(33):11057-61. doi: 10.1523/JNEUROSCI.1120-10.2010.