Fishman Yonatan I, Micheyl Christophe, Steinschneider Mitchell
Departments of Neurology and Neuroscience, Albert Einstein College of Medicine , Bronx, New York 10461.
Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455; Starkey Hearing Research Center, Berkeley, California 94704.
eNeuro. 2016 Jun 10;3(3). doi: 10.1523/ENEURO.0071-16.2016. eCollection 2016 May-Jun.
Successful speech perception in real-world environments requires that the auditory system segregate competing voices that overlap in frequency and time into separate streams. Vowels are major constituents of speech and are comprised of frequencies (harmonics) that are integer multiples of a common fundamental frequency (F0). The pitch and identity of a vowel are determined by its F0 and spectral envelope (formant structure), respectively. When two spectrally overlapping vowels differing in F0 are presented concurrently, they can be readily perceived as two separate "auditory objects" with pitches at their respective F0s. A difference in pitch between two simultaneous vowels provides a powerful cue for their segregation, which in turn, facilitates their individual identification. The neural mechanisms underlying the segregation of concurrent vowels based on pitch differences are poorly understood. Here, we examine neural population responses in macaque primary auditory cortex (A1) to single and double concurrent vowels (/a/ and /i/) that differ in F0 such that they are heard as two separate auditory objects with distinct pitches. We find that neural population responses in A1 can resolve, via a rate-place code, lower harmonics of both single and double concurrent vowels. Furthermore, we show that the formant structures, and hence the identities, of single vowels can be reliably recovered from the neural representation of double concurrent vowels. We conclude that A1 contains sufficient spectral information to enable concurrent vowel segregation and identification by downstream cortical areas.
在现实环境中成功进行语音感知需要听觉系统将在频率和时间上重叠的相互竞争的声音分离成不同的流。元音是语音的主要组成部分,由一个共同基频(F0)的整数倍频率(谐波)组成。元音的音高和特性分别由其F0和频谱包络(共振峰结构)决定。当两个F0不同但频谱重叠的元音同时呈现时,它们很容易被感知为两个具有各自F0音高的独立“听觉对象”。两个同时出现的元音之间的音高差异为它们的分离提供了一个有力线索,这反过来又有助于对它们进行单独识别。基于音高差异对同时出现的元音进行分离的神经机制尚不清楚。在这里,我们研究了猕猴初级听觉皮层(A1)对F0不同的单元音和双元音(/a/和/i/)同时出现的神经群体反应,使得它们被听作两个具有不同音高的独立听觉对象。我们发现,A1中的神经群体反应可以通过速率-位置编码解析单元音和双元音同时出现时的较低谐波。此外,我们表明,单元音的共振峰结构以及特性可以从双元音同时出现时的神经表征中可靠地恢复。我们得出结论,A1包含足够的频谱信息,以使下游皮层区域能够进行元音同时出现时的分离和识别。