Suppr超能文献

听力正常成年人的多模态时间关系与视听语音识别

Intermodal timing relations and audio-visual speech recognition by normal-hearing adults.

作者信息

McGrath M, Summerfield Q

出版信息

J Acoust Soc Am. 1985 Feb;77(2):678-85. doi: 10.1121/1.392336.

Abstract

Audio-visual identification of sentences was measured as a function of audio delay in untrained observers with normal hearing; the soundtrack was replaced by rectangular pulses originally synchronized to the closing of the talker's vocal folds and then subjected to delay. When the soundtrack was delayed by 160 ms, identification scores were no better than when no acoustical information at all was provided. Delays of up to 80 ms had little effect on group-mean performance, but a separate analysis of a subgroup of better lipreaders showed a significant trend of reduced scores with increased delay in the range from 0-80 ms. A second experiment tested the interpretation that, although the main disruptive effect of the delay occurred on a syllabic time scale, better lipreaders might be attempting to use intermodal timing cues at a phonemic level. Normal-hearing observers determined whether a 120-Hz complex tone started before or after the opening of a pair of liplike Lissajou figures. Group-mean difference limens (70.7% correct DLs) were - 79 ms (sound leading) and + 138 ms (sound lagging), with no significant correlation between DLs and sentence lipreading scores. It was concluded that most observers, whether good lipreaders or not, possess insufficient sensitivity to intermodal timing cues in audio-visual speech for them to be used analogously to voice onset time in auditory speech perception. The results of both experiments imply that delays of up to about 40 ms introduced by signal-processing algorithms in aids to lipreading should not materially affect audio-visual speech understanding.

摘要

在听力正常的未受过训练的观察者中,句子的视听识别被测量为音频延迟的函数;音轨被原本与说话者声带闭合同步的矩形脉冲取代,然后进行延迟处理。当音轨延迟160毫秒时,识别分数并不比完全不提供声学信息时更好。高达80毫秒的延迟对群体平均表现影响不大,但对一组较好的唇读者的单独分析显示,在0 - 80毫秒范围内,随着延迟增加,分数有显著下降趋势。第二个实验测试了这样一种解释,即尽管延迟的主要干扰作用发生在音节时间尺度上,但较好的唇读者可能试图在音素层面使用跨模态时间线索。听力正常的观察者判断一个120赫兹的复合音是在一对唇状李萨如图形打开之前还是之后开始。群体平均差异阈限(正确辨别阈限为70.7%)为 - 79毫秒(声音领先)和 + 138毫秒(声音滞后),辨别阈限与句子唇读分数之间无显著相关性。得出的结论是,大多数观察者,无论是否是优秀的唇读者,对视听言语中的跨模态时间线索的敏感度都不足,无法像在听觉言语感知中使用语音起始时间那样类似地使用这些线索。两个实验的结果都表明,助听唇读设备中信号处理算法引入的高达约40毫秒的延迟不应实质性地影响视听言语理解。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验