共振峰频率变化与无关共振峰对语音的信息掩蔽：反对动态和特定语音声学限制的证据。

Formant-frequency variation and informational masking of speech by extraneous formants: evidence against dynamic and speech-specific acoustical constraints.

作者信息

Roberts Brian, Summers Robert J, Bailey Peter J

机构信息

Psychology, School of Life and Health Sciences.

Department of Psychology, University of York.

出版信息

J Exp Psychol Hum Percept Perform. 2014 Aug;40(4):1507-25. doi: 10.1037/a0036629. Epub 2014 May 19.

DOI:10.1037/a0036629

PMID:24842068

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4120706/

Abstract

How speech is separated perceptually from other speech remains poorly understood. Recent research indicates that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This study explored the effects of manipulating the depth and pattern of that variation. Three formants (F1+F2+F3) constituting synthetic analogues of natural sentences were distributed across the 2 ears, together with a competitor for F2 (F2C) that listeners must reject to optimize recognition (left = F1+F2C; right = F2+F3). The frequency contours of F1 - F3 were each scaled to 50% of their natural depth, with little effect on intelligibility. Competitors were created either by inverting the frequency contour of F2 about its geometric mean (a plausibly speech-like pattern) or using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs. This suggests that competitor impact depends on overall depth of frequency variation, not depth relative to that for the target formants. The absence of tuning (i.e., no minimum in intelligibility for the 50% case) suggests that the ability to reject an extraneous formant does not depend on similarity in the depth of formant-frequency variation. Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints.

摘要

语音在感知上如何与其他语音区分开来，目前仍知之甚少。最近的研究表明，一个外来共振峰损害可懂度的能力取决于其频率轮廓的变化。本研究探讨了操纵这种变化的深度和模式的影响。构成自然句子合成类似物的三个共振峰（F1+F2+F3）分布在两只耳朵上，同时还有一个F2的竞争者（F2C），听众必须排除它才能优化识别（左=F1+F2C；右=F2+F3）。F1 - F3的频率轮廓各自按其自然深度的50%进行缩放，对可懂度影响不大。通过将F2的频率轮廓围绕其几何平均值反转（一种可能类似语音的模式）或使用与反转后的F2C的平均变化率和深度相匹配的规则且任意的频率轮廓（三角波，不太可能类似语音）来创建竞争者。添加竞争者通常会降低可懂度；这种降低取决于F2C变化的深度，对于100%深度的F2C最大，对于50%深度的F2C居中，对于0%深度（恒定）的F2C最小。这表明竞争者的影响取决于频率变化的总体深度，而不是相对于目标共振峰的深度。不存在调谐（即50%情况在可懂度上没有最小值）表明排除外来共振峰的能力不取决于共振峰频率变化深度的相似性。此外，三角波竞争者与更类似语音的竞争者效果一样，这表明从整体中选择共振峰也不取决于特定于语音的约束。