基于振荡相关性将语音与干扰声音分离。

Separation of speech from interfering sounds based on oscillatory correlation.

作者信息

Wang D L, Brown G J

机构信息

Department of Computer and Information Science and Center for Cognitive Science, The Ohio State University, Columbus, OH 43210-1277, USA.

出版信息

IEEE Trans Neural Netw. 1999;10(3):684-97. doi: 10.1109/72.761727.

DOI:10.1109/72.761727

PMID:18252568

Abstract

A multistage neural model is proposed for an auditory scene analysis task--segregating speech from interfering sound sources. The core of the model is a two-layer oscillator network that performs stream segregation on the basis of oscillatory correlation. In the oscillatory correlation framework, a stream is represented by a population of synchronized relaxation oscillators, each of which corresponds to an auditory feature, and different streams are represented by desynchronized oscillator populations. Lateral connections between oscillators encode harmonicity, and proximity in frequency and time. Prior to the oscillator network are a model of the auditory periphery and a stage in which mid-level auditory representations are formed. The model has been systematically evaluated using a corpus of voiced speech mixed with interfering sounds, and produces improvements in terms of signal-to-noise ratio for every mixture. The performance of our model is compared with other studies on computational auditory scene analysis. A number of issues including biological plausibility and real-time implementation are also discussed.

摘要

我们提出了一种用于听觉场景分析任务的多级神经模型，该任务是将语音与干扰声源分离。该模型的核心是一个两层振荡器网络，它基于振荡相关性执行流分离。在振荡相关性框架中，一个流由一群同步的弛豫振荡器表示，每个振荡器对应一个听觉特征，不同的流由不同步的振荡器群体表示。振荡器之间的横向连接编码谐波性以及频率和时间上的接近度。在振荡器网络之前是听觉外周模型和形成中级听觉表征的阶段。我们使用混合了干扰声音的浊音语音语料库对该模型进行了系统评估，并且对于每种混合情况，该模型在信噪比方面都有改善。我们将模型的性能与其他计算听觉场景分析研究进行了比较。还讨论了包括生物学合理性和实时实现在内的一些问题。

相似文献

Separation of speech from interfering sounds based on oscillatory correlation.

IEEE Trans Neural Netw. 1999;10(3):684-97. doi: 10.1109/72.761727.

An oscillatory correlation model of auditory streaming.

Cogn Neurodyn. 2008 Mar;2(1):7-19. doi: 10.1007/s11571-007-9035-8. Epub 2008 Jan 10.

ARTSTREAM: a neural network model of auditory scene analysis and source segregation.

Neural Netw. 2004 May;17(4):511-36. doi: 10.1016/j.neunet.2003.10.002.

A computational model of auditory selective attention.

IEEE Trans Neural Netw. 2004 Sep;15(5):1151-63. doi: 10.1109/TNN.2004.832710.

Locally excitatory globally inhibitory oscillator networks.

IEEE Trans Neural Netw. 1995;6(1):283-6. doi: 10.1109/72.363423.

Development of auditory scene analysis: a mini-review.

Front Hum Neurosci. 2024 Mar 12;18:1352247. doi: 10.3389/fnhum.2024.1352247. eCollection 2024.

Segregation of unvoiced speech from nonspeech interference.

J Acoust Soc Am. 2008 Aug;124(2):1306-19. doi: 10.1121/1.2939132.

Sound stream segregation: a neuromorphic approach to solve the "cocktail party problem" in real-time.

Front Neurosci. 2015 Sep 2;9:309. doi: 10.3389/fnins.2015.00309. eCollection 2015.

Image segmentation based on oscillatory correlation.

Neural Comput. 1997 May 15;9(4):805-36. doi: 10.1162/neco.1997.9.4.805.

Primitive auditory stream segregation: a neurophysiological study in the songbird forebrain.

J Neurophysiol. 2004 Aug;92(2):1088-104. doi: 10.1152/jn.00884.2003. Epub 2004 Mar 24.

引用本文的文献

Optimized feature gains explain and predict successes and failures of human selective listening.

bioRxiv. 2025 May 28:2025.05.28.656682. doi: 10.1101/2025.05.28.656682.

Speech perception in noise: Masking and unmasking.

J Otol. 2021 Apr;16(2):109-119. doi: 10.1016/j.joto.2020.12.001. Epub 2020 Dec 11.

An Introductory Review of Deep Learning for Prediction Models With Big Data.

Front Artif Intell. 2020 Feb 28;3:4. doi: 10.3389/frai.2020.00004. eCollection 2020.

The importance of processing resolution in "ideal time-frequency segregation" of masked speech and the implications for predicting speech intelligibility.

J Acoust Soc Am. 2020 Mar;147(3):1648. doi: 10.1121/10.0000893.

Ecological origins of perceptual grouping principles in the auditory system.

Proc Natl Acad Sci U S A. 2019 Dec 10;116(50):25355-25364. doi: 10.1073/pnas.1903887116. Epub 2019 Nov 21.

A Gestalt inference model for auditory scene segregation.

PLoS Comput Biol. 2019 Jan 22;15(1):e1006711. doi: 10.1371/journal.pcbi.1006711. eCollection 2019 Jan.

An oscillatory neural network model that demonstrates the benefits of multisensory learning.

Cogn Neurodyn. 2018 Oct;12(5):481-499. doi: 10.1007/s11571-018-9489-x. Epub 2018 Jun 7.

A comparison of several computational auditory scene analysis (CASA) techniques for monaural speech segregation.

Brain Inform. 2015 Sep;2(3):155-166. doi: 10.1007/s40708-015-0016-0. Epub 2015 Aug 4.

Segregating complex sound sources through temporal coherence.

PLoS Comput Biol. 2014 Dec 18;10(12):e1003985. doi: 10.1371/journal.pcbi.1003985. eCollection 2014 Dec.

Fast and robust image segmentation by small-world neural oscillator networks.

Cogn Neurodyn. 2011 Jun;5(2):209-20. doi: 10.1007/s11571-011-9152-2. Epub 2011 Mar 1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于振荡相关性将语音与干扰声音分离。

Separation of speech from interfering sounds based on oscillatory correlation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献