Suppr超能文献

基于瓶颈特征的语言识别中深度神经网络(DNN)拓扑结构的影响分析

An analysis of the influence of deep neural network (DNN) topology in bottleneck feature based language recognition.

作者信息

Lozano-Diez Alicia, Zazo Ruben, Toledano Doroteo T, Gonzalez-Rodriguez Joaquin

机构信息

Audias-UAM, Universidad Autonoma de Madrid, Madrid, Spain.

出版信息

PLoS One. 2017 Aug 10;12(8):e0182580. doi: 10.1371/journal.pone.0182580. eCollection 2017.

Abstract

Language recognition systems based on bottleneck features have recently become the state-of-the-art in this research field, showing its success in the last Language Recognition Evaluation (LRE 2015) organized by NIST (U.S. National Institute of Standards and Technology). This type of system is based on a deep neural network (DNN) trained to discriminate between phonetic units, i.e. trained for the task of automatic speech recognition (ASR). This DNN aims to compress information in one of its layers, known as bottleneck (BN) layer, which is used to obtain a new frame representation of the audio signal. This representation has been proven to be useful for the task of language identification (LID). Thus, bottleneck features are used as input to the language recognition system, instead of a classical parameterization of the signal based on cepstral feature vectors such as MFCCs (Mel Frequency Cepstral Coefficients). Despite the success of this approach in language recognition, there is a lack of studies analyzing in a systematic way how the topology of the DNN influences the performance of bottleneck feature-based language recognition systems. In this work, we try to fill-in this gap, analyzing language recognition results with different topologies for the DNN used to extract the bottleneck features, comparing them and against a reference system based on a more classical cepstral representation of the input signal with a total variability model. This way, we obtain useful knowledge about how the DNN configuration influences bottleneck feature-based language recognition systems performance.

摘要

基于瓶颈特征的语言识别系统最近已成为该研究领域的最新技术,在美国国家标准与技术研究院(NIST)组织的上一次语言识别评估(LRE 2015)中展现出了其成功之处。这类系统基于一个经过训练以区分语音单元的深度神经网络(DNN),即针对自动语音识别(ASR)任务进行训练。该DNN旨在在其被称为瓶颈(BN)层的一层中压缩信息,这一层用于获取音频信号的新帧表示。这种表示已被证明对语言识别(LID)任务有用。因此,瓶颈特征被用作语言识别系统的输入,而不是基于诸如梅尔频率倒谱系数(MFCC)等倒谱特征向量的信号经典参数化。尽管这种方法在语言识别方面取得了成功,但缺乏系统分析DNN拓扑结构如何影响基于瓶颈特征的语言识别系统性能的研究。在这项工作中,我们试图填补这一空白,分析用于提取瓶颈特征的DNN采用不同拓扑结构时的语言识别结果,将它们相互比较,并与基于输入信号更经典的倒谱表示和全变异性模型的参考系统进行比较。通过这种方式,我们获得了关于DNN配置如何影响基于瓶颈特征的语言识别系统性能的有用知识。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验