使用听觉样滤波器组处理语音信号可以最大程度地减少对发音动作的不确定性。

Processing speech signal using auditory-like filterbank provides least uncertainty about articulatory gestures.

机构信息

Signal Analysis and Interpretation Laboratory, Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089, USA.

出版信息

J Acoust Soc Am. 2011 Jun;129(6):4014-22. doi: 10.1121/1.3573987.

DOI:10.1121/1.3573987

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3135153/

Abstract

Understanding how the human speech production system is related to the human auditory system has been a perennial subject of inquiry. To investigate the production-perception link, in this paper, a computational analysis has been performed using the articulatory movement data obtained during speech production with concurrently recorded acoustic speech signals from multiple subjects in three different languages: English, Cantonese, and Georgian. The form of articulatory gestures during speech production varies across languages, and this variation is considered to be reflected in the articulatory position and kinematics. The auditory processing of the acoustic speech signal is modeled by a parametric representation of the cochlear filterbank which allows for realizing various candidate filterbank structures by changing the parameter value. Using mathematical communication theory, it is found that the uncertainty about the articulatory gestures in each language is maximally reduced when the acoustic speech signal is represented using the output of a filterbank similar to the empirically established cochlear filterbank in the human auditory system. Possible interpretations of this finding are discussed.

摘要

理解人类言语产生系统与人类听觉系统的关系一直是一个长期的研究课题。为了研究产生-感知的联系，本文使用来自三个不同语言（英语、粤语和格鲁吉亚语）的多个主体在言语产生过程中获得的发音运动数据和同时记录的声学言语信号进行了计算分析。言语产生过程中的发音动作形式因语言而异，这种变化被认为反映在发音位置和运动学上。通过对耳蜗滤波器组的参数表示来模拟声学言语信号的听觉处理，通过改变参数值可以实现各种候选滤波器组结构。使用数学通信理论，发现当使用类似于人类听觉系统中经验建立的耳蜗滤波器组的滤波器组的输出来表示声学言语信号时，每种语言的发音动作的不确定性被最大程度地降低。讨论了对这一发现的可能解释。

相似文献

1

Processing speech signal using auditory-like filterbank provides least uncertainty about articulatory gestures.使用听觉样滤波器组处理语音信号可以最大程度地减少对发音动作的不确定性。

J Acoust Soc Am. 2011 Jun;129(6):4014-22. doi: 10.1121/1.3573987.

2

Decoding Articulatory Features from fMRI Responses in Dorsal Speech Regions.从背侧言语区域的功能磁共振成像反应中解码发音特征

J Neurosci. 2015 Nov 11;35(45):15015-25. doi: 10.1523/JNEUROSCI.0977-15.2015.

3

Adaptive auditory feedback control of the production of formant trajectories in the Mandarin triphthong /iau/ and its pattern of generalization.适应听觉反馈控制普通话三元音 /iau/ 的共振峰轨迹的产生及其泛化模式。

J Acoust Soc Am. 2010 Oct;128(4):2033-48. doi: 10.1121/1.3479539.

4

The contribution of visual articulatory gestures and orthography to speech processing: Evidence from novel word learning.视觉发音手势和正字法对言语加工的作用：来自新单词学习的证据。

J Exp Psychol Learn Mem Cogn. 2022 Oct;48(10):1542-1558. doi: 10.1037/xlm0001036. Epub 2021 Aug 9.

5

Critique: auditory form and gestural topology in the perception of speech.评论：语音感知中的听觉形式与手势拓扑结构

J Acoust Soc Am. 1996 Mar;99(3):1695-98. doi: 10.1121/1.414693.

6

A procedure for estimating gestural scores from speech acoustics.一种从语音声学估算手势分数的方法。

J Acoust Soc Am. 2012 Dec;132(6):3980-9. doi: 10.1121/1.4763545.

7

A modeling investigation of articulatory variability and acoustic stability during American English /r/ production.美式英语/r/发音过程中发音器官变异性和声学稳定性的建模研究。

J Acoust Soc Am. 2005 May;117(5):3196-212. doi: 10.1121/1.1893271.

8

Synchrony capture filterbank: auditory-inspired signal processing for tracking individual frequency components in speech.同步捕获滤波器组：听觉启发的信号处理，用于跟踪语音中的单个频率分量。

J Acoust Soc Am. 2013 Jun;133(6):4290-310. doi: 10.1121/1.4802653.

9

Acoustic and articulatory analysis of French vowels produced by congenitally blind adults and sighted adults.先天性盲人和视力正常成年人发法语元音的声学和发音分析。

J Acoust Soc Am. 2013 Oct;134(4):2975-87. doi: 10.1121/1.4818740.

10

Processing of changes in visual speech in the human auditory cortex.人类听觉皮层中视觉言语变化的处理。

Brain Res Cogn Brain Res. 2002 May;13(3):417-25. doi: 10.1016/s0926-6410(02)00053-8.

引用本文的文献

1

Nonlinear auditory models yield new insights into representations of vowels.非线性听觉模型为元音表征带来了新的见解。

Atten Percept Psychophys. 2019 May;81(4):1034-1046. doi: 10.3758/s13414-018-01644-w.

2

Articulating What Infants Attune to in Native Speech.阐明婴儿对母语语音的感知内容。

Ecol Psychol. 2016 Oct 1;28(4):216-261. doi: 10.1080/10407413.2016.1230372. Epub 2016 Nov 1.

3

Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research.用于语音科学与技术研究的声道实时磁共振成像进展。

APSIPA Trans Signal Inf Process. 2016;5. doi: 10.1017/ATSIP.2016.5. Epub 2016 Mar 31.

4

Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories.直接从数据中得出的类似发音手势的表征保留了有关音素类别的辨别信息。

Comput Speech Lang. 2016 Mar 1;36:330-346. doi: 10.1016/j.csl.2015.03.004. Epub 2015 Mar 21.

5

Speech Coding in the Brain: Representation of Vowel Formants by Midbrain Neurons Tuned to Sound Fluctuations.大脑中的语音编码：中脑神经元对声音波动的调整，以代表元音共振峰。

eNeuro. 2015 Jul 20;2(4). doi: 10.1523/ENEURO.0004-15.2015. eCollection 2015 Jul-Aug.

本文引用的文献

1

Bark frequency transform using an arbitrary order allpass filter.使用任意阶全通滤波器的 Bark 频率变换。

IEEE Signal Process Lett. 2010 Mar;17(6):543-546. doi: 10.1109/LSP.2010.2046192.

2

Efficient auditory coding.高效听觉编码

Nature. 2006 Feb 23;439(7079):978-82. doi: 10.1038/nature04485.

3

Listening to speech activates motor areas involved in speech production.听语音会激活参与语音产生的运动区域。

Nat Neurosci. 2004 Jul;7(7):701-2. doi: 10.1038/nn1263. Epub 2004 Jun 6.

4

The molecular architecture of the inner ear.内耳的分子结构。

Br Med Bull. 2002;63:5-24. doi: 10.1093/bmb/63.1.5.

5

Efficient coding of natural sounds.自然声音的高效编码。

Nat Neurosci. 2002 Apr;5(4):356-63. doi: 10.1038/nn831.

6

Comparative morphometry of mammalian central auditory systems: variation in nuclei and form of the ascending system.哺乳动物中枢听觉系统的比较形态学：上行系统核团及形态的变异

Brain Behav Evol. 1998;51(2):59-89. doi: 10.1159/000006530.

7

Role of articulation in speech perception: clues from production.发音在言语感知中的作用：来自发音的线索。

J Acoust Soc Am. 1996 Mar;99(3):1683-92. doi: 10.1121/1.414691.

8

Perception of the speech code.语音编码的感知

Psychol Rev. 1967 Nov;74(6):431-61. doi: 10.1037/h0020279.

9

The motor theory of speech perception revised.言语知觉的运动理论修正版。

Cognition. 1985 Oct;21(1):1-36. doi: 10.1016/0010-0277(85)90021-6.

10

Electromagnetic midsagittal articulometer systems for transducing speech articulatory movements.用于转换语音发音运动的电磁矢状面关节测量系统。

J Acoust Soc Am. 1992 Dec;92(6):3078-96. doi: 10.1121/1.404204.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验