Suppr超能文献

[用于视听语音识别的多模态时间线索]

[Intermodal timing cues for audio-visual speech recognition].

作者信息

Hashimoto Masahiro, Kumashiro Masaharu

机构信息

Bio-information Research Center, University of Occupational and Environmental Health, Yahatanishi-ku, Kitakyushu 807-8555, Japan.

出版信息

J UOEH. 2004 Jun 1;26(2):215-25. doi: 10.7888/juoeh.26.215.

Abstract

The purpose of this study was to investigate the limitations of lip-reading advantages for Japanese young adults by desynchronizing visual and auditory information in speech. In the experiment, audio-visual speech stimuli were presented under the six test conditions: audio-alone, and audio-visually with either 0, 60, 120, 240 or 480 ms of audio delay. The stimuli were the video recordings of a face of a female Japanese speaking long and short Japanese sentences. The intelligibility of the audio-visual stimuli was measured as a function of audio delays in sixteen untrained young subjects. Speech intelligibility under the audio-delay condition of less than 120 ms was significantly better than that under the audio-alone condition. On the other hand, the delay of 120 ms corresponded to the mean mora duration measured for the audio stimuli. The results implied that audio delays of up to 120 ms would not disrupt lip-reading advantage, because visual and auditory information in speech seemed to be integrated on a syllabic time scale. Potential applications of this research include noisy workplace in which a worker must extract relevant speech from all the other competing noises.

摘要

本研究的目的是通过使语音中的视觉和听觉信息不同步,来探究日本年轻人唇读优势的局限性。在实验中,视听语音刺激在六种测试条件下呈现:仅音频,以及视听结合且音频延迟分别为0、60、120、240或480毫秒。刺激材料是一位讲日语的日本女性面部的视频记录,她说出了长、短日语句子。在16名未经训练的年轻受试者中,测量了视听刺激的可懂度作为音频延迟的函数。音频延迟小于120毫秒时的语音可懂度明显优于仅音频条件下的语音可懂度。另一方面,120毫秒的延迟对应于音频刺激测量的平均音拍持续时间。结果表明,高达120毫秒的音频延迟不会破坏唇读优势,因为语音中的视觉和听觉信息似乎在音节时间尺度上整合。本研究的潜在应用包括嘈杂的工作场所,在这种环境中,工人必须从所有其他竞争噪音中提取相关语音。

相似文献

1
[Intermodal timing cues for audio-visual speech recognition].
J UOEH. 2004 Jun 1;26(2):215-25. doi: 10.7888/juoeh.26.215.
2
Intermodal timing relations and audio-visual speech recognition by normal-hearing adults.
J Acoust Soc Am. 1985 Feb;77(2):678-85. doi: 10.1121/1.392336.
3
Visual speech influences speech perception immediately but not automatically.
Atten Percept Psychophys. 2017 Feb;79(2):660-678. doi: 10.3758/s13414-016-1249-6.
4
Differential Auditory and Visual Phase-Locking Are Observed during Audio-Visual Benefit and Silent Lip-Reading for Speech Perception.
J Neurosci. 2022 Aug 3;42(31):6108-6120. doi: 10.1523/JNEUROSCI.2476-21.2022. Epub 2022 Jun 27.
5
Seeing to hear better: evidence for early audio-visual interactions in speech identification.
Cognition. 2004 Sep;93(2):B69-78. doi: 10.1016/j.cognition.2004.01.006.
7
Degradation of labial information modifies audiovisual speech perception in cochlear-implanted children.
Ear Hear. 2013 Jan-Feb;34(1):110-21. doi: 10.1097/AUD.0b013e3182670993.
8
The use of visible speech cues for improving auditory detection of spoken sentences.
J Acoust Soc Am. 2000 Sep;108(3 Pt 1):1197-208. doi: 10.1121/1.1288668.
9
Congruent audiovisual speech enhances auditory attention decoding with EEG.
J Neural Eng. 2019 Nov 6;16(6):066033. doi: 10.1088/1741-2552/ab4340.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验