在匹配和不匹配条件下开发顺序训练的鲁棒旁遮普语语音识别系统。

Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions.

作者信息

Bawa Puneet, Kadyan Virender, Tripathy Abinash, Singh Thipendra P

机构信息

Centre of Excellence for Speech and Multimodal Laboratory, Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India.

Speech and Language Research Centre, School of Computer Science, University of Petroleum and Energy Studies (UPES), Energy Acres, Bidholi, Dehradun, Uttarakhand 248007 India.

出版信息

Complex Intell Systems. 2023;9(1):1-23. doi: 10.1007/s40747-022-00651-7. Epub 2022 Jun 2.

DOI:10.1007/s40747-022-00651-7

PMID:35668730

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9160864/

Abstract

Development of a native language robust ASR framework is very challenging as well as an active area of research. Although an urge for investigation of effective front-end as well as back-end approaches are required for tackling environment differences, large training complexity and inter-speaker variability in achieving success of a recognition system. In this paper, four front-end approaches: mel-frequency cepstral coefficients (MFCC), Gammatone frequency cepstral coefficients (GFCC), relative spectral-perceptual linear prediction (RASTA-PLP) and power-normalized cepstral coefficients (PNCC) have been investigated to generate unique and robust feature vectors at different SNR values. Furthermore, to handle the large training data complexity, parameter optimization has been performed with sequence-discriminative training techniques: maximum mutual information (MMI), minimum phone error (MPE), boosted-MMI (bMMI), and state-level minimum Bayes risk (sMBR). It has been demonstrated by selection of an optimal value of parameters using lattice generation, and adjustments of learning rates. In proposed framework, four different systems have been tested by analyzing various feature extraction approaches (with or without speaker normalization through Vocal Tract Length Normalization (VTLN) approach in test set) and classification strategy on with or without artificial extension of train dataset. To compare each system performance, true matched (adult train and test-S1, child train and test-S2) and mismatched (adult train and child test-S3, adult + child train and child test-S4) systems on large adult and very small Punjabi clean speech corpus have been demonstrated. Consequently, gender-based in-domain data augmented is used to moderate acoustic and phonetic variations throughout adult and children's speech under mismatched conditions. The experiment result shows that an effective framework developed on PNCC + VTLN front-end approach using TDNN-sMBR-based model through parameter optimization technique yields a relative improvement (RI) of 40.18%, 47.51%, and 49.87% in matched, mismatched and gender-based in-domain augmented system under typical clean and noisy conditions, respectively.

摘要

开发一个强大的母语自动语音识别（ASR）框架极具挑战性，同时也是一个活跃的研究领域。尽管为应对环境差异、巨大的训练复杂性以及说话者之间的变异性以实现识别系统的成功，需要迫切研究有效的前端和后端方法。在本文中，研究了四种前端方法：梅尔频率倒谱系数（MFCC）、伽马通频率倒谱系数（GFCC）、相对谱感知线性预测（RASTA-PLP）和功率归一化倒谱系数（PNCC），以在不同信噪比（SNR）值下生成独特且强大的特征向量。此外，为处理大量训练数据的复杂性，使用序列判别训练技术进行了参数优化：最大互信息（MMI）、最小音素错误（MPE）、增强型MMI（bMMI）和状态级最小贝叶斯风险（sMBR）。通过使用格生成选择参数的最优值以及调整学习率，已证明了这一点。在所提出的框架中，通过分析各种特征提取方法（在测试集中有无通过声道长度归一化（VTLN）方法进行说话者归一化）以及在有无人工扩展训练数据集的情况下的分类策略，对四个不同的系统进行了测试。为比较每个系统的性能，在大型成人和非常小的旁遮普语纯净语音语料库上展示了真匹配（成人训练和测试-S1，儿童训练和测试-S2）和不匹配（成人训练和儿童测试-S3，成人+儿童训练和儿童测试-S4）系统。因此，基于性别的域内数据增强被用于在不匹配条件下缓解成人和儿童语音中的声学和语音变化。实验结果表明，通过参数优化技术在基于PNCC + VTLN前端方法上使用基于TDNN-sMBR的模型开发的有效框架，在典型的纯净和噪声条件下，在匹配、不匹配和基于性别的域内增强系统中分别产生了40.18%、47.51%和49.87%的相对改进（RI）。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2094/9160864/e64ebb8e2d60/40747_2022_651_Fig1_HTML.jpg

相似文献

Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions.在匹配和不匹配条件下开发顺序训练的鲁棒旁遮普语语音识别系统。

Complex Intell Systems. 2023;9(1):1-23. doi: 10.1007/s40747-022-00651-7. Epub 2022 Jun 2.

Ensemble learning with speaker embeddings in multiple speech task stimuli for depression detection.在用于抑郁症检测的多语音任务刺激中结合说话人嵌入的集成学习。

Front Neurosci. 2023 Mar 23;17:1141621. doi: 10.3389/fnins.2023.1141621. eCollection 2023.

A Two-Level Speaker Identification System via Fusion of Heterogeneous Classifiers and Complementary Feature Cooperation.基于异质分类器融合与互补特征协作的两级说话人识别系统。

Sensors (Basel). 2021 Jul 28;21(15):5097. doi: 10.3390/s21155097.

Unsupervised speaker adaptation for speaker independent acoustic to articulatory speech inversion.无监督说话人自适应的说话人无关声学到发音语音反转。

J Acoust Soc Am. 2019 Jul;146(1):316. doi: 10.1121/1.5116130.

Two-Step Joint Optimization with Auxiliary Loss Function for Noise-Robust Speech Recognition.两步联合优化与辅助损失函数的噪声鲁棒语音识别。

Sensors (Basel). 2022 Jul 19;22(14):5381. doi: 10.3390/s22145381.

A bio-inspired feature extraction for robust speech recognition.一种用于稳健语音识别的受生物启发的特征提取方法。

Springerplus. 2014 Nov 4;3:651. doi: 10.1186/2193-1801-3-651. eCollection 2014.

Stressed Speech Emotion Recognition Using Teager Energy and Spectral Feature Fusion with Feature Optimization.基于声门激励能量和频谱特征融合及特征优化的应激语音情感识别

Comput Intell Neurosci. 2023 Oct 11;2023:5765760. doi: 10.1155/2023/5765760. eCollection 2023.

Audio Augmentation for Non-Native Children's Speech Recognition through Discriminative Learning.通过判别式学习实现非母语儿童语音识别的音频增强

Entropy (Basel). 2022 Oct 19;24(10):1490. doi: 10.3390/e24101490.

Performance enhancement for audio-visual speaker identification using dynamic facial muscle model.使用动态面部肌肉模型提高视听说话人识别性能

Med Biol Eng Comput. 2006 Oct;44(10):919-30. doi: 10.1007/s11517-006-0106-5. Epub 2006 Sep 26.

Recognizing the message and the messenger: biomimetic spectral analysis for robust speech and speaker recognition.识别信息与传递者：用于可靠语音和说话人识别的仿生光谱分析

Int J Speech Technol. 2013;16(3):313-322. doi: 10.1007/s10772-012-9184-y. Epub 2012 Dec 18.

在匹配和不匹配条件下开发顺序训练的鲁棒旁遮普语语音识别系统。

Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions.

作者信息

Bawa Puneet, Kadyan Virender, Tripathy Abinash, Singh Thipendra P

机构信息

Centre of Excellence for Speech and Multimodal Laboratory, Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India.

Speech and Language Research Centre, School of Computer Science, University of Petroleum and Energy Studies (UPES), Energy Acres, Bidholi, Dehradun, Uttarakhand 248007 India.

出版信息

Complex Intell Systems. 2023;9(1):1-23. doi: 10.1007/s40747-022-00651-7. Epub 2022 Jun 2.

DOI:10.1007/s40747-022-00651-7

PMID:35668730

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9160864/

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

在匹配和不匹配条件下开发顺序训练的鲁棒旁遮普语语音识别系统。

Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions.

作者信息

机构信息

出版信息

相似文献

在匹配和不匹配条件下开发顺序训练的鲁棒旁遮普语语音识别系统。

Developing sequentially trained robust Punjabi speech recognition system under matched and mismatched conditions.

作者信息

机构信息

出版信息

相似文献