RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo, Forskningsveien 3A, 1094 Blindern, 0317, Oslo, Norway.
Department of Psychology, University of Oslo, Oslo, Norway.
Sci Rep. 2021 Nov 17;11(1):22442. doi: 10.1038/s41598-021-01797-z.
Cross-modal integration is ubiquitous within perception and, in humans, the McGurk effect demonstrates that seeing a person articulating speech can change what we hear into a new auditory percept. It remains unclear whether cross-modal integration of sight and sound generalizes to other visible vocal articulations like those made by singers. We surmise that perceptual integrative effects should involve music deeply, since there is ample indeterminacy and variability in its auditory signals. We show that switching videos of sung musical intervals changes systematically the estimated distance between two notes of a musical interval so that pairing the video of a smaller sung interval to a relatively larger auditory led to compression effects on rated intervals, whereas the reverse led to a stretching effect. In addition, after seeing a visually switched video of an equally-tempered sung interval and then hearing the same interval played on the piano, the two intervals were judged often different though they differed only in instrument. These findings reveal spontaneous, cross-modal, integration of vocal sounds and clearly indicate that strong integration of sound and sight can occur beyond the articulations of natural speech.
跨模态整合在感知中无处不在,而在人类中,麦格克效应表明,看到一个人发音可以将我们听到的内容改变为新的听觉感知。目前尚不清楚视觉和声音的跨模态整合是否可以推广到其他可见的发声动作,如歌手的发声动作。我们推测,由于音乐的听觉信号存在大量的不确定性和可变性,感知整合效应应该与音乐深度相关。我们表明,切换演唱的音乐音程的视频会系统地改变两个音乐音程之间的估计距离,从而使较小的演唱音程的视频与相对较大的听觉音程配对会导致对音程的压缩效果,而反之则会导致拉伸效果。此外,在看到一个经过视觉切换的演唱的等音程的视频,然后听到钢琴演奏相同的音程后,两个音程被判断经常不同,尽管它们仅在乐器上有所不同。这些发现揭示了声音的自发、跨模态整合,清楚地表明,声音和视觉的强烈整合可以超出自然语音的发音。