利用听觉注意线索和任务相关高级信息进行突出检测

Prominence Detection Using Auditory Attention Cues and Task-Dependent High Level Information.

作者信息

Kalinli Ozlem, Narayanan Shrikanth

机构信息

Department of Electrical Engineering, University of Southern California, Los Angeles, ca 90089 USA.

出版信息

IEEE Trans Audio Speech Lang Process. 2009 Jul 1;17(5):1009-1024. doi: 10.1109/tasl.2009.2014795.

DOI:10.1109/tasl.2009.2014795

PMID:20084186

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2806691/

Abstract

Auditory attention is a complex mechanism that involves the processing of low-level acoustic cues together with higher level cognitive cues. In this paper, a novel method is proposed that combines biologically inspired auditory attention cues with higher level lexical and syntactic information to model task-dependent influences on a given spoken language processing task. A set of low-level multiscale features (intensity, frequency contrast, temporal contrast, orientation, and pitch) is extracted in parallel from the auditory spectrum of the sound based on the processing stages in the central auditory system to create feature maps that are converted to auditory gist features that capture the essence of a sound scene. The auditory attention model biases the gist features in a task-dependent way to maximize target detection in a given scene. Furthermore, the top-down task-dependent influence of lexical and syntactic information is incorporated into the model using a probabilistic approach. The lexical information is incorporated by using a probabilistic language model, and the syntactic knowledge is modeled using part-of-speech (POS) tags. The combined model is tested on automatically detecting prominent syllables in speech using the BU Radio News Corpus. The model achieves 88.33% prominence detection accuracy at the syllable level and 85.71% accuracy at the word level. These results compare well with reported human performance on this task.

摘要

听觉注意力是一种复杂的机制，它涉及到对低级声学线索以及高级认知线索的处理。在本文中，提出了一种新颖的方法，该方法将受生物启发的听觉注意力线索与高级词汇和句法信息相结合，以模拟对给定口语处理任务的任务依赖性影响。基于中枢听觉系统的处理阶段，从声音的听觉频谱中并行提取一组低级多尺度特征（强度、频率对比、时间对比、方向和音高），以创建特征图，这些特征图被转换为捕捉声音场景本质的听觉主旨特征。听觉注意力模型以任务依赖的方式对主旨特征进行加权，以在给定场景中最大化目标检测。此外，使用概率方法将词汇和句法信息的自上而下的任务依赖性影响纳入模型。通过使用概率语言模型纳入词汇信息，并使用词性（POS）标签对句法知识进行建模。使用波士顿大学广播新闻语料库对组合模型进行了自动检测语音中突出音节的测试。该模型在音节级别上的突出检测准确率达到88.33%，在单词级别上的准确率达到85.71%。这些结果与报道的人类在此任务上的表现相比具有优势。

相似文献

Prominence Detection Using Auditory Attention Cues and Task-Dependent High Level Information.

IEEE Trans Audio Speech Lang Process. 2009 Jul 1;17(5):1009-1024. doi: 10.1109/tasl.2009.2014795.

A TOP-DOWN AUDITORY ATTENTION MODEL FOR LEARNING TASK DEPENDENT INFLUENCES ON PROMINENCE DETECTION IN SPEECH.

Proc IEEE Int Conf Acoust Speech Signal Process. 2008;2008:3981-3984. doi: 10.1109/ICASSP.2008.4518526.

Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence.

IEEE Trans Audio Speech Lang Process. 2008 Jan;16(1):216-228. doi: 10.1109/TASL.2007.907570.

An Acoustic Measure for Word Prominence in Spontaneous Speech.

IEEE Trans Audio Speech Lang Process. 2007 Feb 1;15(2):690-701. doi: 10.1109/tasl.2006.881703.

Exploiting Acoustic and Syntactic Features for Automatic Prosody Labeling in a Maximum Entropy Framework.

IEEE Trans Audio Speech Lang Process. 2008;16(4):797-811. doi: 10.1109/TASL.2008.917071.

Attentional capture by spoken language: effects on netballers' visual task performance.

J Sports Sci. 2014;32(17):1611-20. doi: 10.1080/02640414.2014.908323. Epub 2014 Apr 16.

Attention Is Required for Knowledge-Based Sequential Grouping: Insights from the Integration of Syllables into Words.

J Neurosci. 2018 Jan 31;38(5):1178-1188. doi: 10.1523/JNEUROSCI.2606-17.2017. Epub 2017 Dec 18.

Effect of auditory cues to lexical stress on the visual perception of gestural timing.

Atten Percept Psychophys. 2025 Apr 30. doi: 10.3758/s13414-025-03072-z.

Sonority's Effect as a Surface Cue on Lexical Speech Perception of Children With Cochlear Implants.

Ear Hear. 2018 Sep/Oct;39(5):992-1007. doi: 10.1097/AUD.0000000000000559.

ARTSTREAM: a neural network model of auditory scene analysis and source segregation.

Neural Netw. 2004 May;17(4):511-36. doi: 10.1016/j.neunet.2003.10.002.

引用本文的文献

The cocktail-party problem revisited: early processing and selection of multi-talker speech.

Atten Percept Psychophys. 2015 Jul;77(5):1465-87. doi: 10.3758/s13414-015-0882-9.

Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language: Computational techniques are presented to analyze and model expressed and perceived human behavior-variedly characterized as typical, atypical, distressed, and disordered-from speech and language cues and their applications in health, commerce, education, and beyond.

Proc IEEE Inst Electr Electron Eng. 2013 Feb 7;101(5):1203-1233. doi: 10.1109/JPROC.2012.2236291.

Spatial orienting in complex audiovisual environments.

Hum Brain Mapp. 2014 Apr;35(4):1597-614. doi: 10.1002/hbm.22276. Epub 2013 Apr 24.

Quantifying attentional modulation of auditory-evoked cortical responses from single-trial electroencephalography.

Front Hum Neurosci. 2013 Apr 4;7:115. doi: 10.3389/fnhum.2013.00115. eCollection 2013.

本文引用的文献

Guided Search 2.0 A revised model of visual search.

Psychon Bull Rev. 1994 Jun;1(2):202-38. doi: 10.3758/BF03200774.

A TOP-DOWN AUDITORY ATTENTION MODEL FOR LEARNING TASK DEPENDENT INFLUENCES ON PROMINENCE DETECTION IN SPEECH.

Proc IEEE Int Conf Acoust Speech Signal Process. 2008;2008:3981-3984. doi: 10.1109/ICASSP.2008.4518526.

Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence.

IEEE Trans Audio Speech Lang Process. 2008 Jan;16(1):216-228. doi: 10.1109/TASL.2007.907570.

Auditory attention--focusing the searchlight on sound.

Curr Opin Neurobiol. 2007 Aug;17(4):437-55. doi: 10.1016/j.conb.2007.07.011. Epub 2007 Aug 21.

Rapid biologically-inspired scene classification using features shared with visual attention.

IEEE Trans Pattern Anal Mach Intell. 2007 Feb;29(2):300-12. doi: 10.1109/TPAMI.2007.40.

Modeling attention to salient proto-objects.

Neural Netw. 2006 Nov;19(9):1395-407. doi: 10.1016/j.neunet.2006.10.001.

Vision: stimulating your attention.

Curr Biol. 2006 Aug 8;16(15):R581-3. doi: 10.1016/j.cub.2006.07.009.

Language outside the focus of attention: the mismatch negativity as a tool for studying higher cognitive processes.

Prog Neurobiol. 2006 May;79(1):49-71. doi: 10.1016/j.pneurobio.2006.04.004. Epub 2006 Jun 30.

Mechanisms for allocating auditory attention: an auditory saliency map.

Curr Biol. 2005 Nov 8;15(21):1943-7. doi: 10.1016/j.cub.2005.09.040.

The neuronal representation of pitch in primate auditory cortex.

Nature. 2005 Aug 25;436(7054):1161-5. doi: 10.1038/nature03867.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用听觉注意线索和任务相关高级信息进行突出检测

Prominence Detection Using Auditory Attention Cues and Task-Dependent High Level Information.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献