Microsoft Research, One Microsoft Way, Redmond, WA, 98052, USA.
Event Lab, Department of Clinical Psychology and Psychobiology, University of Barcelona, Barcelona, 08035, Spain.
Sci Rep. 2017 Jun 19;7(1):3817. doi: 10.1038/s41598-017-04201-x.
Humans are good at selectively listening to specific target conversations, even in the presence of multiple concurrent speakers. In our research, we study how auditory-visual cues modulate this selective listening. We do so by using immersive Virtual Reality technologies with spatialized audio. Exposing 32 participants to an Information Masking Task with concurrent speakers, we find significantly more errors in the decision-making processes triggered by asynchronous audiovisual speech cues. More precisely, the results show that lips on the Target speaker matched to a secondary (Mask) speaker's audio severely increase the participants' comprehension error rates. In a control experiment (n = 20), we further explore the influences of the visual modality over auditory selective attention. The results show a dominance of visual-speech cues, which effectively turn the Mask into the Target and vice-versa. These results reveal a disruption of selective attention that is triggered by bottom-up multisensory integration. The findings are framed in the sensory perception and cognitive neuroscience theories. The VR setup is validated by replicating previous results in this literature in a supplementary experiment.
人类善于有选择地听取特定的目标对话,即使有多个同时说话的人存在。在我们的研究中,我们通过使用具有空间化音频的沉浸式虚拟现实技术来研究听觉-视觉线索如何调节这种选择性聆听。我们让 32 名参与者参与带有同时说话者的信息掩蔽任务,我们发现,由异步视听语音线索触发的决策过程中的错误明显更多。更准确地说,结果表明,与次要(掩蔽)说话者的音频匹配的目标说话者的嘴唇严重增加了参与者的理解错误率。在一个对照实验(n=20)中,我们进一步探索了视觉模态对听觉选择性注意的影响。结果显示,视觉-语音线索占据主导地位,有效地将掩蔽者变成了目标,反之亦然。这些结果揭示了由自下而上的多感觉整合触发的选择性注意力的破坏。这些发现是在感官知觉和认知神经科学理论的框架内提出的。虚拟现实设置通过在补充实验中复制该文献中的先前结果得到了验证。