用于说话人识别的混合机器学习分类方案。

V Karthikeyan, S Suja Priyadharsini

Department of Electronics and Communication Engineering, Kalasalingam Institute of Technology, Srivilliputhur, Tamilnadu, 626126, India.

Department of Electronics and Communication Engineering, Anna University Regional Campus-Tirunelveli, Tirunelveli, Tamilnadu, 627007, India.

J Forensic Sci. 2022 May;67(3):1033-1048. doi: 10.1111/1556-4029.15006. Epub 2022 Feb 9.

Motivated by the requirement to prepare for the next generation of "Automatic Spokesperson Recognition" (ASR) system, this paper applied the fused spectral features with hybrid machine learning (ML) strategy to the speech communication field. This strategy involved the combined spectral features such as mel-frequency cepstral coefficients (MFCCs), spectral kurtosis, spectral skewness, normalized pitch frequency (NPF), and formants. The characterization of suggested classification method could possibly serve in advanced speaker identification scenarios. Special attention was given to hybrid ML scheme capable of finding unknown speakers equipped with speaker id-detecting classifier technique, known as "Random Forest-Support Vector Machine" (RF-SVM). The extracted speaker precise spectral attributes are applied to the hybrid RF-SVM classifier to identify/verify the particular speaker. This work aims to construct an ensemble decision tree on a bounded area with minimal misclassification error using a hybrid ensemble RF-SVM strategy. A series of standard, real-time speaker databases, and noise conditions are functionally tested to validate its performance with other state-of-the-art mechanisms. The proposed fusion method succeeds in the speaker identification task with a high identification rate (97% avg) and lower equal error rate (EER) (<2%), compared with the individual schemes for the recorded experimental dataset. The robustness of the classifier is validated using the standard ELSDSR, TIMIT, and NIST audio datasets. Experiments on ELSDSR, TIMIT, and NIST datasets show that the hybrid classifier produces 98%, 99%, and 94% accuracy, and EERs were 2%, 1%, and 2% respectively. The findings are then compared with well-known other speaker recognition schemes and found to be superior.

出于为下一代“自动发言人识别”（ASR）系统做准备的需求，本文将融合光谱特征与混合机器学习（ML）策略应用于语音通信领域。该策略涉及组合光谱特征，如梅尔频率倒谱系数（MFCCs）、光谱峰度、光谱偏度、归一化基音频率（NPF）和共振峰。所建议的分类方法的特征可能适用于高级说话人识别场景。特别关注了能够通过称为“随机森林 - 支持向量机”（RF - SVM）的说话人身份检测分类器技术找到未知说话人的混合ML方案。提取的说话人精确光谱属性应用于混合RF - SVM分类器以识别/验证特定说话人。这项工作旨在使用混合集成RF - SVM策略在有界区域构建一个具有最小误分类误差的集成决策树。对一系列标准的实时说话人数据库和噪声条件进行功能测试，以验证其与其他现有技术机制相比的性能。与记录的实验数据集的单个方案相比，所提出的融合方法在说话人识别任务中成功实现了高识别率（平均97%）和较低的等错误率（EER）（<2%）。使用标准的ELSDSR、TIMIT和NIST音频数据集验证了分类器的鲁棒性。在ELSDSR、TIMIT和NIST数据集上的实验表明，混合分类器的准确率分别为98%、99%和94%，EER分别为2%、1%和2%。然后将这些结果与其他知名的说话人识别方案进行比较，发现该方法更具优势。

相似文献

Hybrid machine learning classification scheme for speaker identification.

J Forensic Sci. 2022 May;67(3):1033-1048. doi: 10.1111/1556-4029.15006. Epub 2022 Feb 9.

Ensemble learning with speaker embeddings in multiple speech task stimuli for depression detection.

Front Neurosci. 2023 Mar 23;17:1141621. doi: 10.3389/fnins.2023.1141621. eCollection 2023.

A Two-Level Speaker Identification System via Fusion of Heterogeneous Classifiers and Complementary Feature Cooperation.

Sensors (Basel). 2021 Jul 28;21(15):5097. doi: 10.3390/s21155097.

Inter classifier comparison to detect voice pathologies.

Math Biosci Eng. 2021 Mar 5;18(3):2258-2273. doi: 10.3934/mbe.2021114.

Development of High Accuracy Classifier for the Speaker Recognition System.

Appl Bionics Biomech. 2021 May 19;2021:5559616. doi: 10.1155/2021/5559616. eCollection 2021.

Information fusion and multi-classifier system for miner fatigue recognition in plateau environments based on electrocardiography and electromyography signals.

Comput Methods Programs Biomed. 2021 Nov;211:106451. doi: 10.1016/j.cmpb.2021.106451. Epub 2021 Oct 2.

Effect on speech emotion classification of a feature selection approach using a convolutional neural network.

PeerJ Comput Sci. 2021 Nov 3;7:e766. doi: 10.7717/peerj-cs.766. eCollection 2021.

Urban Tree Species Classification Using a WorldView-2/3 and LiDAR Data Fusion Approach and Deep Learning.

Sensors (Basel). 2019 Mar 14;19(6):1284. doi: 10.3390/s19061284.

New transformed features generated by deep bottleneck extractor and a GMM-UBM classifier for speaker age and gender classification.

Neural Comput Appl. 2018;30(8):2581-2593. doi: 10.1007/s00521-017-2848-4. Epub 2017 Jan 17.

Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech.

Neural Netw. 2016 Jun;78:97-111. doi: 10.1016/j.neunet.2015.12.010. Epub 2015 Dec 30.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Hybrid machine learning classification scheme for speaker identification.

J Forensic Sci. 2022 May;67(3):1033-1048. doi: 10.1111/1556-4029.15006. Epub 2022 Feb 9.

Ensemble learning with speaker embeddings in multiple speech task stimuli for depression detection.

Front Neurosci. 2023 Mar 23;17:1141621. doi: 10.3389/fnins.2023.1141621. eCollection 2023.

A Two-Level Speaker Identification System via Fusion of Heterogeneous Classifiers and Complementary Feature Cooperation.

Sensors (Basel). 2021 Jul 28;21(15):5097. doi: 10.3390/s21155097.

Inter classifier comparison to detect voice pathologies.

Math Biosci Eng. 2021 Mar 5;18(3):2258-2273. doi: 10.3934/mbe.2021114.

Development of High Accuracy Classifier for the Speaker Recognition System.

Appl Bionics Biomech. 2021 May 19;2021:5559616. doi: 10.1155/2021/5559616. eCollection 2021.

Information fusion and multi-classifier system for miner fatigue recognition in plateau environments based on electrocardiography and electromyography signals.

Comput Methods Programs Biomed. 2021 Nov;211:106451. doi: 10.1016/j.cmpb.2021.106451. Epub 2021 Oct 2.

Effect on speech emotion classification of a feature selection approach using a convolutional neural network.

PeerJ Comput Sci. 2021 Nov 3;7:e766. doi: 10.7717/peerj-cs.766. eCollection 2021.

Urban Tree Species Classification Using a WorldView-2/3 and LiDAR Data Fusion Approach and Deep Learning.

Sensors (Basel). 2019 Mar 14;19(6):1284. doi: 10.3390/s19061284.

New transformed features generated by deep bottleneck extractor and a GMM-UBM classifier for speaker age and gender classification.

Neural Comput Appl. 2018;30(8):2581-2593. doi: 10.1007/s00521-017-2848-4. Epub 2017 Jan 17.

Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech.

Neural Netw. 2016 Jun;78:97-111. doi: 10.1016/j.neunet.2015.12.010. Epub 2015 Dec 30.

Hybrid machine learning classification scheme for speaker identification.

作者信息

机构信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献