Kalinli Ozlem, Narayanan Shrikanth
Department of Electrical Engineering, University of Southern California, Los Angeles, ca 90089 USA.
IEEE Trans Audio Speech Lang Process. 2009 Jul 1;17(5):1009-1024. doi: 10.1109/tasl.2009.2014795.
Auditory attention is a complex mechanism that involves the processing of low-level acoustic cues together with higher level cognitive cues. In this paper, a novel method is proposed that combines biologically inspired auditory attention cues with higher level lexical and syntactic information to model task-dependent influences on a given spoken language processing task. A set of low-level multiscale features (intensity, frequency contrast, temporal contrast, orientation, and pitch) is extracted in parallel from the auditory spectrum of the sound based on the processing stages in the central auditory system to create feature maps that are converted to auditory gist features that capture the essence of a sound scene. The auditory attention model biases the gist features in a task-dependent way to maximize target detection in a given scene. Furthermore, the top-down task-dependent influence of lexical and syntactic information is incorporated into the model using a probabilistic approach. The lexical information is incorporated by using a probabilistic language model, and the syntactic knowledge is modeled using part-of-speech (POS) tags. The combined model is tested on automatically detecting prominent syllables in speech using the BU Radio News Corpus. The model achieves 88.33% prominence detection accuracy at the syllable level and 85.71% accuracy at the word level. These results compare well with reported human performance on this task.
听觉注意力是一种复杂的机制,它涉及到对低级声学线索以及高级认知线索的处理。在本文中,提出了一种新颖的方法,该方法将受生物启发的听觉注意力线索与高级词汇和句法信息相结合,以模拟对给定口语处理任务的任务依赖性影响。基于中枢听觉系统的处理阶段,从声音的听觉频谱中并行提取一组低级多尺度特征(强度、频率对比、时间对比、方向和音高),以创建特征图,这些特征图被转换为捕捉声音场景本质的听觉主旨特征。听觉注意力模型以任务依赖的方式对主旨特征进行加权,以在给定场景中最大化目标检测。此外,使用概率方法将词汇和句法信息的自上而下的任务依赖性影响纳入模型。通过使用概率语言模型纳入词汇信息,并使用词性(POS)标签对句法知识进行建模。使用波士顿大学广播新闻语料库对组合模型进行了自动检测语音中突出音节的测试。该模型在音节级别上的突出检测准确率达到88.33%,在单词级别上的准确率达到85.71%。这些结果与报道的人类在此任务上的表现相比具有优势。