Oh Yonghee, Schwalm Meg, Kalpin Nicole
Department of Otolaryngology-Head and Neck Surgery and Communicative Disorders, University of Louisville, Louisville, KY, United States.
Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville, FL, United States.
Front Neurosci. 2022 Oct 20;16:1031424. doi: 10.3389/fnins.2022.1031424. eCollection 2022.
A series of our previous studies explored the use of an abstract visual representation of the amplitude envelope cues from target sentences to benefit speech perception in complex listening environments. The purpose of this study was to expand this auditory-visual speech perception to the tactile domain. Twenty adults participated in speech recognition measurements in four different sensory modalities (AO, auditory-only; AV, auditory-visual; AT, auditory-tactile; AVT, auditory-visual-tactile). The target sentences were fixed at 65 dB sound pressure level and embedded within a simultaneous speech-shaped noise masker of varying degrees of signal-to-noise ratios (-7, -5, -3, -1, and 1 dB SNR). The amplitudes of both abstract visual and vibrotactile stimuli were temporally synchronized with the target speech envelope for comparison. Average results showed that adding temporally-synchronized multimodal cues to the auditory signal did provide significant improvements in word recognition performance across all three multimodal stimulus conditions (AV, AT, and AVT), especially at the lower SNR levels of -7, -5, and -3 dB for both male (8-20% improvement) and female (5-25% improvement) talkers. The greatest improvement in word recognition performance (15-19% improvement for males and 14-25% improvement for females) was observed when both visual and tactile cues were integrated (AVT). Another interesting finding in this study is that temporally synchronized abstract visual and vibrotactile stimuli additively stack in their influence on speech recognition performance. Our findings suggest that a multisensory integration process in speech perception requires salient temporal cues to enhance speech recognition ability in noisy environments.
我们之前的一系列研究探索了使用目标句子幅度包络线索的抽象视觉表示,以在复杂聆听环境中促进语音感知。本研究的目的是将这种视听语音感知扩展到触觉领域。20名成年人参与了四种不同感官模式下的语音识别测量(AO,仅听觉;AV,视听;AT,听触觉;AVT,视听触觉)。目标句子的声压级固定为65分贝,并嵌入到具有不同信噪比(-7、-5、-3、-1和1分贝信噪比)的同步语音形状噪声掩蔽器中。抽象视觉和振动触觉刺激的幅度在时间上与目标语音包络同步,以便进行比较。平均结果表明,在所有三种多模态刺激条件(AV、AT和AVT)下,向听觉信号添加时间同步的多模态线索确实显著提高了单词识别性能,尤其是对于男性(提高8 - 20%)和女性(提高5 - 25%)说话者,在-7、-5和-3分贝的较低信噪比水平下。当视觉和触觉线索都被整合时(AVT),观察到单词识别性能的最大提高(男性提高15 - 19%,女性提高14 - 25%)。本研究中的另一个有趣发现是,时间同步的抽象视觉和振动触觉刺激对语音识别性能的影响是累加的。我们的研究结果表明,语音感知中的多感官整合过程需要显著的时间线索来提高嘈杂环境中的语音识别能力。