School of Communication Sciences & Disorders, University of Memphis, 4055 North Park Loop, Memphis, Tennessee 38152, USA.
J Acoust Soc Am. 2019 Jul;146(1):60. doi: 10.1121/1.5114822.
Speech perception requires grouping acoustic information into meaningful linguistic-phonetic units via categorical perception (CP). Beyond shrinking observers' perceptual space, CP might aid degraded speech perception if categories are more resistant to noise than surface acoustic features. Combining audiovisual (AV) cues also enhances speech recognition, particularly in noisy environments. This study investigated the degree to which visual cues from a talker (i.e., mouth movements) aid speech categorization amidst noise interference by measuring participants' identification of clear and noisy speech (0 dB signal-to-noise ratio) presented in auditory-only or combined AV modalities (i.e., A, A+noise, AV, AV+noise conditions). Auditory noise expectedly weakened (i.e., shallower identification slopes) and slowed speech categorization. Interestingly, additional viseme cues largely counteracted noise-related decrements in performance and stabilized classification speeds in both clear and noise conditions suggesting more precise acoustic-phonetic representations with multisensory information. Results are parsimoniously described under a signal detection theory framework and by a reduction (visual cues) and increase (noise) in the precision of perceptual object representation, which were not due to lapses of attention or guessing. Collectively, findings show that (i) mapping sounds to categories aids speech perception in "cocktail party" environments; (ii) visual cues help lattice formation of auditory-phonetic categories to enhance and refine speech identification.
言语感知需要通过范畴感知 (CP) 将声学信息分组为有意义的语言-语音单位。除了缩小观察者的感知空间外,如果类别比表面声学特征更能抵抗噪声,CP 还可能有助于降低语音感知的难度。结合视听 (AV) 线索也可以提高语音识别能力,尤其是在嘈杂的环境中。本研究通过测量参与者对仅听觉呈现的清晰和嘈杂语音 (0 dB 信噪比) 以及听觉与视听结合呈现的语音 (A、A+noise、AV、AV+noise 条件) 的识别,调查了说话人视觉线索(即口型运动)在噪声干扰中对言语分类的帮助程度。听觉噪声预期会削弱(即识别斜率变浅)并减缓言语分类。有趣的是,额外的视觉线索在清晰和噪声条件下,很大程度上抵消了与噪声相关的性能下降,并稳定了分类速度,这表明多感官信息具有更精确的声学-语音表示。结果在信号检测理论框架下以及感知对象表示的精度降低(视觉线索)和增加(噪声)下得到了简洁的描述,这不是由于注意力不集中或猜测造成的。总的来说,研究结果表明:(i)将声音映射到类别有助于在“鸡尾酒会”环境中进行语音感知;(ii)视觉线索有助于形成听觉-语音类别网格,从而增强和改善语音识别。