视觉在感知上恢复了言语中的听觉频谱动态。

Vision perceptually restores auditory spectral dynamics in speech.

机构信息

Department of Psychology, University of Michigan, Ann Arbor, MI 48109;

Department of Psychology, Northwestern University, Evanston, IL 60208.

出版信息

Proc Natl Acad Sci U S A. 2020 Jul 21;117(29):16920-16927. doi: 10.1073/pnas.2002887117. Epub 2020 Jul 6.

DOI:10.1073/pnas.2002887117

PMID:32632010

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7382243/

Abstract

Visual speech facilitates auditory speech perception, but the visual cues responsible for these benefits and the information they provide remain unclear. Low-level models emphasize basic temporal cues provided by mouth movements, but these impoverished signals may not fully account for the richness of auditory information provided by visual speech. High-level models posit interactions among abstract categorical (i.e., phonemes/visemes) or amodal (e.g., articulatory) speech representations, but require lossy remapping of speech signals onto abstracted representations. Because visible articulators shape the spectral content of speech, we hypothesized that the perceptual system might exploit natural correlations between midlevel visual (oral deformations) and auditory speech features (frequency modulations) to extract detailed spectrotemporal information from visual speech without employing high-level abstractions. Consistent with this hypothesis, we found that the time-frequency dynamics of oral resonances (formants) could be predicted with unexpectedly high precision from the changing shape of the mouth during speech. When isolated from other speech cues, speech-based shape deformations improved perceptual sensitivity for corresponding frequency modulations, suggesting that listeners could exploit this cross-modal correspondence to facilitate perception. To test whether this type of correspondence could improve speech comprehension, we selectively degraded the spectral or temporal dimensions of auditory sentence spectrograms to assess how well visual speech facilitated comprehension under each degradation condition. Visual speech produced drastically larger enhancements during spectral degradation, suggesting a condition-specific facilitation effect driven by cross-modal recovery of auditory speech spectra. The perceptual system may therefore use audiovisual correlations rooted in oral acoustics to extract detailed spectrotemporal information from visual speech.

摘要

视觉语音促进听觉语音感知，但负责这些益处的视觉线索以及它们提供的信息仍不清楚。低水平模型强调由口部运动提供的基本时间线索，但这些简化的信号可能无法充分说明视觉语音提供的听觉信息的丰富性。高水平模型假设抽象类别（即音素/视位）或无模态（例如，发音）语音表示之间的相互作用，但需要对语音信号进行有损的重新映射到抽象表示上。由于可见的发音器官塑造了语音的频谱内容，我们假设感知系统可能利用中观视觉（口腔变形）和听觉语音特征（频率调制）之间的自然相关性，无需采用高级抽象，从视觉语音中提取详细的时频信息。与该假设一致，我们发现口腔共振（共振峰）的时频动态可以从语音过程中口形的变化以出人意料的高精度来预测。当与其他语音线索隔离开来时，基于语音的形状变形可以提高对相应频率调制的感知灵敏度，这表明听众可以利用这种跨模态对应关系来促进感知。为了测试这种对应关系是否可以改善语音理解，我们选择性地降低了听觉句子声谱图的光谱或时间维度，以评估在每种降解条件下视觉语音对理解的促进程度。在光谱降解期间，视觉语音产生了极大的增强，这表明这是一种由听觉语音频谱的跨模态恢复驱动的特定条件下的促进效应。因此，感知系统可能会使用根植于口腔声学的视听相关性，从视觉语音中提取详细的时频信息。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

视觉在感知上恢复了言语中的听觉频谱动态。

Vision perceptually restores auditory spectral dynamics in speech.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

相似文献

引用本文的文献

本文引用的文献

视觉在感知上恢复了言语中的听觉频谱动态。

Vision perceptually restores auditory spectral dynamics in speech.

机构信息

出版信息