Swaminathan Jayaganesh, Mason Christine R, Streeter Timothy M, Best Virginia, Roverud Elin, Kidd Gerald
Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215
Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215.
J Neurosci. 2016 Aug 3;36(31):8250-7. doi: 10.1523/JNEUROSCI.4421-15.2016.
While conversing in a crowded social setting, a listener is often required to follow a target speech signal amid multiple competing speech signals (the so-called "cocktail party" problem). In such situations, separation of the target speech signal in azimuth from the interfering masker signals can lead to an improvement in target intelligibility, an effect known as spatial release from masking (SRM). This study assessed the contributions of two stimulus properties that vary with separation of sound sources, binaural envelope (ENV) and temporal fine structure (TFS), to SRM in normal-hearing (NH) human listeners. Target speech was presented from the front and speech maskers were either colocated with or symmetrically separated from the target in azimuth. The target and maskers were presented either as natural speech or as "noise-vocoded" speech in which the intelligibility was conveyed only by the speech ENVs from several frequency bands; the speech TFS within each band was replaced with noise carriers. The experiments were designed to preserve the spatial cues in the speech ENVs while retaining/eliminating them from the TFS. This was achieved by using the same/different noise carriers in the two ears. A phenomenological auditory-nerve model was used to verify that the interaural correlations in TFS differed across conditions, whereas the ENVs retained a high degree of correlation, as intended. Overall, the results from this study revealed that binaural TFS cues, especially for frequency regions below 1500 Hz, are critical for achieving SRM in NH listeners. Potential implications for studying SRM in hearing-impaired listeners are discussed.
Acoustic signals received by the auditory system pass first through an array of physiologically based band-pass filters. Conceptually, at the output of each filter, there are two principal forms of temporal information: slowly varying fluctuations in the envelope (ENV) and rapidly varying fluctuations in the temporal fine structure (TFS). The importance of these two types of information in everyday listening (e.g., conversing in a noisy social situation; the "cocktail-party" problem) has not been established. This study assessed the contributions of binaural ENV and TFS cues for understanding speech in multiple-talker situations. Results suggest that, whereas the ENV cues are important for speech intelligibility, binaural TFS cues are critical for perceptually segregating the different talkers and thus for solving the cocktail party problem.
在拥挤的社交场合交谈时,听众常常需要在多个相互竞争的语音信号(即所谓的“鸡尾酒会”问题)中追踪目标语音信号。在这种情况下,从方位上分离目标语音信号与干扰掩蔽信号可提高目标可懂度,这种效应称为空间掩蔽释放(SRM)。本研究评估了两种随声源分离而变化的刺激特性——双耳包络(ENV)和时间精细结构(TFS)——对正常听力(NH)人类听众SRM的贡献。目标语音从前方呈现,语音掩蔽声在方位上与目标语音共置或对称分离。目标语音和掩蔽声以自然语音或“噪声编码”语音呈现,其中可懂度仅由几个频段的语音包络传达;每个频段内的语音TFS被噪声载波取代。实验旨在保留语音包络中的空间线索,同时在TFS中保留/消除这些线索。这通过在双耳中使用相同/不同的噪声载波来实现。使用现象学听觉神经模型来验证不同条件下TFS中的双耳相关性不同,而包络如预期那样保持高度相关性。总体而言,本研究结果表明,双耳TFS线索,尤其是对于1500 Hz以下的频率区域,对于NH听众实现SRM至关重要。讨论了对研究听力受损听众SRM的潜在影响。
听觉系统接收到的声信号首先通过一系列基于生理的带通滤波器。从概念上讲,在每个滤波器的输出端,有两种主要的时间信息形式:包络(ENV)中缓慢变化的波动和时间精细结构(TFS)中快速变化的波动。这两种类型的信息在日常听力(例如,在嘈杂的社交场合交谈;“鸡尾酒会”问题)中的重要性尚未确定。本研究评估了双耳ENV和TFS线索在多说话者情况下对理解语音的贡献。结果表明,虽然ENV线索对语音可懂度很重要,但双耳TFS线索对于在感知上分离不同的说话者从而解决鸡尾酒会问题至关重要。