听觉和视觉模态在汉语焦点感知中的作用

The Role of Auditory and Visual Modality in Perception of Focus in Mandarin Chinese.

作者信息

Li Shanpeng, Wu Yihan, Calhoun Sasha, Yan Mengzhu

机构信息

MIIT Key Lab for Language Information Processing and Applications, School of Foreign Studies, Nanjing University of Science and Technology, China.

School of Linguistics and Applied Language Studies, Te Herenga Waka-Victoria University of Wellington, New Zealand.

出版信息

J Speech Lang Hear Res. 2025 Aug 12;68(8):3843-3860. doi: 10.1044/2025_JSLHR-24-00664. Epub 2025 Jul 11.

DOI:10.1044/2025_JSLHR-24-00664

PMID:40644325

Abstract

PURPOSE

Speech perception is a complex process that involves multiple sensory modalities. Despite our intuitions of speech as something we hear, accumulating evidence has shown that speech perception is not solely dependent on the auditory modality. While it is well established that auditory and visual cues can both help listeners perceive focus, the latter is not established in Mandarin, and the relative contribution of these cues is not established at all. The current study investigated Mandarin listeners' integration of auditory and visual cues in the interpretation of focus in noise-degraded speech, through a question-answer appropriateness rating experiment.

METHOD

To explore the effectiveness and relative contribution of auditory and visual modality in the interpretation of Mandarin focus, participants did a question-answer appropriateness rating task involving subject focus, object focus, and broad focus. All the question-answer pairs were constructed in three modalities: audio only, visual only, and audiovisual. They were instructed to rate the appropriateness of the question-answer pairs. A babble noise was superimposed on the audio track for the audio only and audiovisual conditions.

RESULTS AND CONCLUSIONS

Although auditory cues via prosodic prominence were an effective cue to interpreting focus, visual cues were proven more effective, at least with degraded audio. Overall, this research contributes to our understanding of the interaction between linguistic cues and sensory information during language comprehension, widens the range of languages included in this body of research, and provides important implications for future studies on focus processing in various linguistic contexts and communication settings. This, in turn, will deepen our understanding of the multimodal nature of language comprehension.

摘要

目的

言语感知是一个涉及多种感觉模态的复杂过程。尽管我们直觉上认为言语是我们所听到的东西，但越来越多的证据表明，言语感知并非仅仅依赖于听觉模态。虽然听觉和视觉线索都能帮助听众感知焦点这一点已得到充分证实，但在普通话中视觉线索的作用尚未得到证实，而且这些线索的相对贡献也完全没有定论。本研究通过一项问答适宜性评级实验，调查了普通话听众在解读噪声干扰语音中的焦点时对听觉和视觉线索的整合情况。

方法

为了探究听觉和视觉模态在普通话焦点解读中的有效性和相对贡献，参与者进行了一项涉及主语焦点、宾语焦点和宽泛焦点的问答适宜性评级任务。所有问答对均以三种模态构建：仅音频、仅视觉和视听结合。他们被要求对问答对的适宜性进行评分。在仅音频和视听结合条件下，向音频轨道叠加了嘈杂噪声。

结果与结论

尽管通过韵律突出的听觉线索是解读焦点的有效线索，但视觉线索被证明更有效，至少在音频质量下降时如此。总体而言，本研究有助于我们理解语言理解过程中语言线索与感觉信息之间的相互作用，拓宽了该研究领域所涵盖的语言范围，并为未来在各种语言语境和交流环境中进行焦点处理的研究提供了重要启示。反过来，这将加深我们对语言理解多模态本质的理解。