使用皮质带图的说话者归一化：一种用于稳态元音分类的神经模型。

Speaker normalization using cortical strip maps: a neural model for steady-state vowel categorization.

作者信息

Ames Heather, Grossberg Stephen

机构信息

Department of Cognitive and Neural Systems, Center for Adaptive Systems, and Center of Excellence for Learning In Education, Science, and Technology, Boston University, Boston, Massachusetts 02215, USA.

出版信息

J Acoust Soc Am. 2008 Dec;124(6):3918-36. doi: 10.1121/1.2997478.

DOI:10.1121/1.2997478

PMID:19206817

Abstract

Auditory signals of speech are speaker dependent, but representations of language meaning are speaker independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by adaptive resonance theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [Peterson, G. E., and Barney, H.L., J. Acoust. Soc. Am. 24, 175-184 (1952).] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.

摘要

语音的听觉信号依赖于说话者，但语言意义的表征与说话者无关。从依赖说话者到独立于说话者的语言表征的转变，使人们能够从不同的说话者那里学习和理解语音。本文提出了一种神经模型，该模型执行说话者归一化以生成与音高无关的语音表征，同时还保留有关说话者身份的信息。这种与说话者无关的表征被分类为单位化的语音项目，这些项目输入到顺序工作记忆中，其分布式模式可以被分类或组块为音节和单词表征。所提出的模型符合一种新兴的听觉流和语音分类模型。该模型的听觉流和说话者归一化部分都使用多个条带表征和不对称竞争电路，从而表明这两个电路源自相似的神经设计。归一化的语音项目通过自适应共振理论电路快速分类并稳定记忆。模拟使用了彼得森和巴尼[彼得森，G.E.，和巴尼，H.L.，《美国声学学会杂志》24，175 - 184(1952)]元音数据库中的合成稳态元音，并实现了与人类听众相似的准确率。将这些结果与行为数据和其他说话者归一化模型进行了比较。