相似文献

1

Speech Segregation Using an Auditory Vocoder With Event-Synchronous Enhancements.

IEEE Trans Audio Speech Lang Process. 2006 Nov;14(6):2212-2221. doi: 10.1109/TASL.2006.872611.

2

Noise and pitch interact during the cortical segregation of concurrent speech.

Hear Res. 2017 Aug;351:34-44. doi: 10.1016/j.heares.2017.05.008. Epub 2017 May 25.

3

Neural representation of concurrent harmonic sounds in monkey primary auditory cortex: implications for models of auditory scene analysis.

J Neurosci. 2014 Sep 10;34(37):12425-43. doi: 10.1523/JNEUROSCI.0025-14.2014.

4

A harmonic-cancellation-based model to predict speech intelligibility against a harmonic masker.

J Acoust Soc Am. 2020 Nov;148(5):3246. doi: 10.1121/10.0002492.

5

Electric and acoustic harmonic integration predicts speech-in-noise performance in hybrid cochlear implant users.

Hear Res. 2018 Sep;367:223-230. doi: 10.1016/j.heares.2018.06.016. Epub 2018 Jun 28.

6

Cortical processing of distracting speech in noisy auditory scenes depends on perceptual demand.

Neuroimage. 2021 Mar;228:117670. doi: 10.1016/j.neuroimage.2020.117670. Epub 2020 Dec 24.

7

Reverberation impairs brainstem temporal representations of voiced vowel sounds: challenging "periodicity-tagged" segregation of competing speech in rooms.

Front Syst Neurosci. 2015 Jan 12;8:248. doi: 10.3389/fnsys.2014.00248. eCollection 2014.

8

Source-filter comparison of measurements of fundamental frequency perturbation and amplitude perturbation for synthesized voice signals.

J Voice. 2008 Mar;22(2):125-37. doi: 10.1016/j.jvoice.2006.09.007. Epub 2006 Dec 4.

9

Human Frequency Following Responses to Vocoded Speech.

Ear Hear. 2017 Sep/Oct;38(5):e256-e267. doi: 10.1097/AUD.0000000000000432.

10

Transfer of auditory perceptual learning with spectrally reduced speech to speech and nonspeech tasks: implications for cochlear implants.

Ear Hear. 2009 Dec;30(6):662-74. doi: 10.1097/AUD.0b013e3181b9c92d.

引用本文的文献

1

A Dynamic Compressive Gammachirp Auditory Filterbank.

IEEE Trans Audio Speech Lang Process. 2006 Nov;14(6):2222-2232. doi: 10.1109/TASL.2006.874669.

2

Comparison of the roex and gammachirp filters as representations of the auditory filter.

J Acoust Soc Am. 2006 Sep;120(3):1474-92. doi: 10.1121/1.2228539.

本文引用的文献

1

A Dynamic Compressive Gammachirp Auditory Filterbank.

IEEE Trans Audio Speech Lang Process. 2006 Nov;14(6):2222-2232. doi: 10.1109/TASL.2006.874669.

2

Separation of speech from interfering sounds based on oscillatory correlation.

IEEE Trans Neural Netw. 1999;10(3):684-97. doi: 10.1109/72.761727.

3

Robust and accurate fundamental frequency estimation based on dominant harmonic components.

J Acoust Soc Am. 2004 Dec;116(6):3690-700. doi: 10.1121/1.1787522.

4

A duplex theory of pitch perception.

Experientia. 1951 Apr 15;7(4):128-34. doi: 10.1007/BF02156143.

5

Extending the domain of center frequencies for the compressive gammachirp auditory filter.

J Acoust Soc Am. 2003 Sep;114(3):1529-42. doi: 10.1121/1.1600720.

6

A compressive gammachirp auditory filter for both physiological and psychophysical data.

J Acoust Soc Am. 2001 May;109(5 Pt 1):2008-22. doi: 10.1121/1.1367253.

7

Modeling temporal asymmetry in the auditory system.

J Acoust Soc Am. 1998 Nov;104(5):2967-79. doi: 10.1121/1.423879.

8

A comparison of detection and discrimination of temporal asymmetry in amplitude modulation.

J Acoust Soc Am. 1997 Jan;101(1):430-9. doi: 10.1121/1.417988.

9

Temporal asymmetry in the auditory system.

J Acoust Soc Am. 1996 Apr;99(4 Pt 1):2316-31. doi: 10.1121/1.415419.

10

Time-domain modeling of peripheral auditory processing: a modular architecture and a software platform.

J Acoust Soc Am. 1995 Oct;98(4):1890-4. doi: 10.1121/1.414456.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。

使用具有事件同步增强功能的听觉声码器进行语音分离

Speech Segregation Using an Auditory Vocoder With Event-Synchronous Enhancements.

作者信息

Irino Toshio, Patterson Roy D, Kawahara Hideki

机构信息

Faculty of Systems Engineering, Wakayama University, Wakayama 640-8510, Japan

出版信息

IEEE Trans Audio Speech Lang Process. 2006 Nov;14(6):2212-2221. doi: 10.1109/TASL.2006.872611.

DOI:10.1109/TASL.2006.872611

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2828642/

Abstract

We propose a new method to segregate concurrent speech sounds using an auditory version of a channel vocoder. The auditory representation of sound, referred to as an "auditory image," preserves fine temporal information, unlike conventional window-based processing systems. This makes it possible to segregate speech sources with an event synchronous procedure. Fundamental frequency information is used to estimate the sequence of glottal pulse times for a target speaker, and to repress the glottal events of other speakers. The procedure leads to robust extraction of the target speech and effective segregation even when the signal-to-noise ratio is as low as 0 dB. Moreover, the segregation performance remains high when the speech contains jitter, or when the estimate of the fundamental frequency F0 is inaccurate. This contrasts with conventional comb-filter methods where errors in F0 estimation produce a marked reduction in performance. We compared the new method to a comb-filter method using a cross-correlation measure and perceptual recognition experiments. The results suggest that the new method has the potential to supplant comb-filter and harmonic-selection methods for speech enhancement.

摘要

我们提出了一种使用声道声码器的听觉版本来分离并发语音声音的新方法。声音的听觉表征，即所谓的“听觉图像”，与传统的基于窗口的处理系统不同，它保留了精细的时间信息。这使得通过事件同步过程来分离语音源成为可能。基频信息用于估计目标说话者的声门脉冲时间序列，并抑制其他说话者的声门事件。即使在信噪比低至0 dB时，该过程也能可靠地提取目标语音并有效分离。此外，当语音包含抖动，或者基频F0的估计不准确时，分离性能仍然很高。这与传统的梳状滤波器方法形成对比，在传统方法中，F0估计中的误差会导致性能显著下降。我们使用互相关测量和感知识别实验将新方法与梳状滤波器方法进行了比较。结果表明，新方法有潜力取代梳状滤波器和谐波选择方法用于语音增强。