基于高斯混合模型、倒谱分析和遗传选择独特特征的自动说话人识别系统。

Automatic Speaker Recognition System Based on Gaussian Mixture Models, Cepstral Analysis, and Genetic Selection of Distinctive Features.

机构信息

Institute of Optoelectronics, Military University of Technology, 2 Kaliski Street, 00-908 Warsaw, Poland.

BITRES Sp. z o.o., 9/2 Chałubiński Street, 02-004 Warsaw, Poland.

出版信息

Sensors (Basel). 2022 Dec 1;22(23):9370. doi: 10.3390/s22239370.

DOI:10.3390/s22239370

PMID:36502072

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9738489/

Abstract

This article presents the Automatic Speaker Recognition System (ASR System), which successfully resolves problems such as identification within an open set of speakers and the verification of speakers in difficult recording conditions similar to telephone transmission conditions. The article provides complete information on the architecture of the various internal processing modules of the ASR System. The speaker recognition system proposed in the article, has been compared very closely to other competing systems, achieving improved speaker identification and verification results, on known certified voice dataset. The ASR System owes this to the dual use of genetic algorithms both in the feature selection process and in the optimization of the system's internal parameters. This was also influenced by the proprietary feature generation and corresponding classification process using Gaussian mixture models. This allowed the development of a system that makes an important contribution to the current state of the art in speaker recognition systems for telephone transmission applications with known speech coding standards.

摘要

本文提出了自动说话人识别系统（ASR 系统），成功解决了在开放式说话人集内的识别问题，以及在类似于电话传输条件的困难录音条件下对说话人的验证问题。本文提供了 ASR 系统各个内部处理模块的架构的完整信息。本文提出的说话人识别系统与其他竞争系统进行了非常密切的比较，在已知的认证语音数据集上，取得了改进的说话人识别和验证结果。ASR 系统之所以能够实现这一点，是因为遗传算法在特征选择过程和系统内部参数优化中都得到了双重应用。这也受到了使用高斯混合模型的专有特征生成和相应分类过程的影响。这使得开发出的系统为当前具有已知语音编码标准的电话传输应用中的说话人识别系统的最新技术水平做出了重要贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a777/9738489/a22b2f76e7e1/sensors-22-09370-g001.jpg

相似文献

Automatic Speaker Recognition System Based on Gaussian Mixture Models, Cepstral Analysis, and Genetic Selection of Distinctive Features.

Sensors (Basel). 2022 Dec 1;22(23):9370. doi: 10.3390/s22239370.

Optimising Speaker-Dependent Feature Extraction Parameters to Improve Automatic Speech Recognition Performance for People with Dysarthria.

Sensors (Basel). 2021 Sep 27;21(19):6460. doi: 10.3390/s21196460.

Multi-resolution speech analysis for automatic speech recognition using deep neural networks: Experiments on TIMIT.

PLoS One. 2018 Oct 10;13(10):e0205355. doi: 10.1371/journal.pone.0205355. eCollection 2018.

Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech.

Neural Netw. 2016 Jun;78:97-111. doi: 10.1016/j.neunet.2015.12.010. Epub 2015 Dec 30.

Real-time multilingual speech recognition and speaker diarization system based on Whisper segmentation.

PeerJ Comput Sci. 2024 Mar 29;10:e1973. doi: 10.7717/peerj-cs.1973. eCollection 2024.

A Robust Speaker Identification System Using the Responses from a Model of the Auditory Periphery.

PLoS One. 2016 Jul 8;11(7):e0158520. doi: 10.1371/journal.pone.0158520. eCollection 2016.

New transformed features generated by deep bottleneck extractor and a GMM-UBM classifier for speaker age and gender classification.

Neural Comput Appl. 2018;30(8):2581-2593. doi: 10.1007/s00521-017-2848-4. Epub 2017 Jan 17.

Recognizing the message and the messenger: biomimetic spectral analysis for robust speech and speaker recognition.

Int J Speech Technol. 2013;16(3):313-322. doi: 10.1007/s10772-012-9184-y. Epub 2012 Dec 18.

Toward Realigning Automatic Speaker Verification in the Era of COVID-19.

Sensors (Basel). 2022 Mar 30;22(7):2638. doi: 10.3390/s22072638.

One-against-all weighted dynamic time warping for language-independent and speaker-dependent speech recognition in adverse conditions.

PLoS One. 2014 Feb 10;9(2):e85458. doi: 10.1371/journal.pone.0085458. eCollection 2014.

本文引用的文献

Speaker recognition based on deep learning: An overview.

Neural Netw. 2021 Aug;140:65-99. doi: 10.1016/j.neunet.2021.03.004. Epub 2021 Mar 17.

Novel maximum-margin training algorithms for supervised neural networks.

IEEE Trans Neural Netw. 2010 Jun;21(6):972-84. doi: 10.1109/TNN.2010.2046423. Epub 2010 Apr 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于高斯混合模型、倒谱分析和遗传选择独特特征的自动说话人识别系统。

Automatic Speaker Recognition System Based on Gaussian Mixture Models, Cepstral Analysis, and Genetic Selection of Distinctive Features.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献