Department of Otolaryngology - Head and Neck Surgery, Iwate Medical University, 19-1, Uchimaru, Morioka, Iwate, Japan.
Department of Otolaryngology - Head and Neck Surgery, Iwate Medical University, 19-1, Uchimaru, Morioka, Iwate, Japan.
Auris Nasus Larynx. 2020 Oct;47(5):727-733. doi: 10.1016/j.anl.2020.02.008. Epub 2020 Feb 24.
The purpose of this study was to measure the auditory evoked potentials for speech and non-speech sounds with similar spectral distributions.
We developed two types of sounds, comprising naturally spoken vowels (natural speech sounds) and complex synthesized sounds (synthesized sounds). Natural speech sounds consisted of 5 Japanese vowels. Synthesized sounds consisted of a fundamental frequency and its second to fifteenth harmonics equivalent to those of natural speech sounds. The synthesized sound was filtered to have a similar spectral distribution to that of each natural speech sound. These sounds were low-pass filtered at 2000 Hz. The auditory evoked potential elicited by the natural speech sound /o/ and synthesized counterpart for /o/ were measured in 10 right-handed healthy adults with normal hearing.
The natural speech sounds were significantly highly recognized as speech compared to the synthesized sounds (74.4% v.s. 13.8%, p < 0.01). The natural speech and synthesized sounds for the vowel /o/ contrasted strongly for speech perception (96.9% vs. 9.4%, p <0.01). However, the vowel /i/ and its counterpart were barely recognized as speech (4.7 v.s. 3.1%, p = 1.00). The N1 peak amplitudes and latencies evoked by the natural speech sound /o/ were not different from those evoked by the synthesized sound (p = 0.58 and p = 0.28, respectively). The P2 amplitudes evoked by the natural speech sound /o/ were not different from those evoked by the synthesized sound (p = 0.51). The P2 latencies evoked by the natural speech sound /o/ were significantly shorter than those evoked by the synthesized sound (p < 0.01). This modulation was not observed in a control study using the vowel /i/ and its counterpart (p = 0.29).
The early P2 observed may reflect central auditory processing of the 'speechness' of complex sounds.
本研究旨在测量具有相似频谱分布的语音和非语音声音的听觉诱发电位。
我们开发了两种类型的声音,包括自然说出的元音(自然语音)和复杂合成的声音(合成声音)。自然语音由 5 个日语元音组成。合成音由与自然语音相同的基频及其第二至第十五谐波组成。合成声音经过滤波,使其具有与每个自然语音相似的频谱分布。这些声音在 2000 Hz 处进行低通滤波。在 10 名右利手听力正常的健康成年人中,测量了自然语音/o/和/o/的合成对应物诱发的听觉诱发电位。
与合成声音相比,自然语音被显著地更高度地识别为语音(74.4%比 13.8%,p<0.01)。元音/o/的自然语音和合成音在语音感知上对比强烈(96.9%比 9.4%,p<0.01)。然而,元音/i/及其对应物几乎不能被识别为语音(4.7%比 3.1%,p=1.00)。自然语音/o/诱发的 N1 峰幅和潜伏期与合成音诱发的 N1 峰幅和潜伏期无差异(p=0.58 和 p=0.28)。自然语音/o/诱发的 P2 振幅与合成音诱发的 P2 振幅无差异(p=0.51)。自然语音/o/诱发的 P2 潜伏期明显短于合成音诱发的 P2 潜伏期(p<0.01)。在使用元音/i/及其对应物的对照研究中,未观察到这种调制(p=0.29)。
观察到的早期 P2 可能反映了复杂声音的“语音”的中枢听觉处理。