Department of Psychological and Brain Sciences, University of Iowa.
Interdisciplinary Program in Neuroscience, University of Iowa.
Cogn Sci. 2019 Jan;43(1). doi: 10.1111/cogs.12700.
Speech unfolds over time, and the cues for even a single phoneme are rarely available simultaneously. Consequently, to recognize a single phoneme, listeners must integrate material over several hundred milliseconds. Prior work contrasts two accounts: (a) a memory buffer account in which listeners accumulate auditory information in memory and only access higher level representations (i.e., lexical representations) when sufficient information has arrived; and (b) an immediate integration scheme in which lexical representations can be partially activated on the basis of early cues and then updated when more information arises. These studies have uniformly shown evidence for immediate integration for a variety of phonetic distinctions. We attempted to extend this to fricatives, a class of speech sounds which requires not only temporal integration of asynchronous cues (the frication, followed by the formant transitions 150-350 ms later), but also integration across different frequency bands and compensation for contextual factors like coarticulation. Eye movements in the visual world paradigm showed clear evidence for a memory buffer. Results were replicated in five experiments, ruling out methodological factors and tying the release of the buffer to the onset of the vowel. These findings support a general auditory account for speech by suggesting that the acoustic nature of particular speech sounds may have large effects on how they are processed. It also has major implications for theories of auditory and speech perception by raising the possibility of an encapsulated memory buffer in early auditory processing.
言语是随着时间展开的,即使是单个音素的线索也很少同时出现。因此,为了识别单个音素,听众必须在几百毫秒的时间内整合信息。先前的研究对比了两种解释:(a)记忆缓冲区解释,即听众在记忆中积累听觉信息,只有在接收到足够的信息后才会访问更高层次的表示(即词汇表示);(b)即时整合方案,即词汇表示可以基于早期线索部分激活,然后在出现更多信息时进行更新。这些研究一致表明,即时整合适用于各种语音区别。我们试图将其扩展到摩擦音,这是一类需要不仅对异步线索(摩擦音,然后是 150-350 毫秒后出现的共振峰过渡)进行时间整合,还需要在不同的频带之间进行整合,并对协同发音等上下文因素进行补偿的语音。视觉世界范式中的眼动研究清楚地表明了记忆缓冲区的存在。结果在五个实验中得到了复制,排除了方法因素的影响,并将缓冲区的释放与元音的开始联系起来。这些发现通过提出早期听觉处理中可能存在封闭的记忆缓冲区的可能性,为言语的一般听觉解释提供了支持,这表明特定语音的声学性质可能对其处理方式有很大影响。它还对听觉和言语感知理论产生了重大影响,因为它提出了早期听觉处理中可能存在封闭的记忆缓冲区的可能性。