Callan Daniel E, Jones Jeffery A, Munhall Kevin, Kroos Christian, Callan Akiko M, Vatikiotis-Bateson Eric
ATR International, Kyoto, Japan.
J Cogn Neurosci. 2004 Jun;16(5):805-16. doi: 10.1162/089892904970771.
Perception of speech is improved when presentation of the audio signal is accompanied by concordant visual speech gesture information. This enhancement is most prevalent when the audio signal is degraded. One potential means by which the brain affords perceptual enhancement is thought to be through the integration of concordant information from multiple sensory channels in a common site of convergence, multisensory integration (MSI) sites. Some studies have identified potential sites in the superior temporal gyrus/sulcus (STG/S) that are responsive to multisensory information from the auditory speech signal and visual speech movement. One limitation of these studies is that they do not control for activity resulting from attentional modulation cued by such things as visual information signaling the onsets and offsets of the acoustic speech signal, as well as activity resulting from MSI of properties of the auditory speech signal with aspects of gross visual motion that are not specific to place of articulation information. This fMRI experiment uses spatial wavelet bandpass filtered Japanese sentences presented with background multispeaker audio noise to discern brain activity reflecting MSI induced by auditory and visual correspondence of place of articulation information that controls for activity resulting from the above-mentioned factors. The experiment consists of a low-frequency (LF) filtered condition containing gross visual motion of the lips, jaw, and head without specific place of articulation information, a midfrequency (MF) filtered condition containing place of articulation information, and an unfiltered (UF) condition. Sites of MSI selectively induced by auditory and visual correspondence of place of articulation information were determined by the presence of activity for both the MF and UF conditions relative to the LF condition. Based on these criteria, sites of MSI were found predominantly in the left middle temporal gyrus (MTG), and the left STG/S (including the auditory cortex). By controlling for additional factors that could also induce greater activity resulting from visual motion information, this study identifies potential MSI sites that we believe are involved with improved speech perception intelligibility.
当音频信号的呈现伴有一致的视觉语音手势信息时,语音感知会得到改善。当音频信号质量下降时,这种增强最为明显。大脑实现感知增强的一种潜在方式被认为是通过在一个共同的汇聚位点——多感官整合(MSI)位点,整合来自多个感官通道的一致信息。一些研究已经确定了颞上回/沟(STG/S)中的潜在位点,这些位点对来自听觉语音信号和视觉语音运动的多感官信息有反应。这些研究的一个局限性在于,它们没有控制由诸如指示声学语音信号起止的视觉信息所引发的注意力调制导致的活动,以及由听觉语音信号的属性与非特定发音位置信息的总体视觉运动方面的MSI所导致的活动。这个功能磁共振成像(fMRI)实验使用经过空间小波带通滤波的日语句子,并伴有背景多说话者音频噪声,以辨别反映由发音位置信息的听觉和视觉对应所诱导的MSI的大脑活动,同时控制上述因素导致的活动。该实验包括一个低频(LF)滤波条件,其中包含嘴唇、下巴和头部的总体视觉运动,但没有特定的发音位置信息;一个中频(MF)滤波条件,其中包含发音位置信息;以及一个未滤波(UF)条件。通过比较MF和UF条件相对于LF条件下的活动情况,确定了由发音位置信息的听觉和视觉对应选择性诱导的MSI位点。基于这些标准,MSI位点主要位于左侧颞中回(MTG)和左侧STG/S(包括听觉皮层)。通过控制可能因视觉运动信息也会诱导更多活动的其他因素,本研究确定了我们认为与改善语音感知清晰度相关的潜在MSI位点。