Suppr超能文献

用于说话人识别的混合机器学习分类方案。

Hybrid machine learning classification scheme for speaker identification.

作者信息

V Karthikeyan, S Suja Priyadharsini

机构信息

Department of Electronics and Communication Engineering, Kalasalingam Institute of Technology, Srivilliputhur, Tamilnadu, 626126, India.

Department of Electronics and Communication Engineering, Anna University Regional Campus-Tirunelveli, Tirunelveli, Tamilnadu, 627007, India.

出版信息

J Forensic Sci. 2022 May;67(3):1033-1048. doi: 10.1111/1556-4029.15006. Epub 2022 Feb 9.

Abstract

Motivated by the requirement to prepare for the next generation of "Automatic Spokesperson Recognition" (ASR) system, this paper applied the fused spectral features with hybrid machine learning (ML) strategy to the speech communication field. This strategy involved the combined spectral features such as mel-frequency cepstral coefficients (MFCCs), spectral kurtosis, spectral skewness, normalized pitch frequency (NPF), and formants. The characterization of suggested classification method could possibly serve in advanced speaker identification scenarios. Special attention was given to hybrid ML scheme capable of finding unknown speakers equipped with speaker id-detecting classifier technique, known as "Random Forest-Support Vector Machine" (RF-SVM). The extracted speaker precise spectral attributes are applied to the hybrid RF-SVM classifier to identify/verify the particular speaker. This work aims to construct an ensemble decision tree on a bounded area with minimal misclassification error using a hybrid ensemble RF-SVM strategy. A series of standard, real-time speaker databases, and noise conditions are functionally tested to validate its performance with other state-of-the-art mechanisms. The proposed fusion method succeeds in the speaker identification task with a high identification rate (97% avg) and lower equal error rate (EER) (<2%), compared with the individual schemes for the recorded experimental dataset. The robustness of the classifier is validated using the standard ELSDSR, TIMIT, and NIST audio datasets. Experiments on ELSDSR, TIMIT, and NIST datasets show that the hybrid classifier produces 98%, 99%, and 94% accuracy, and EERs were 2%, 1%, and 2% respectively. The findings are then compared with well-known other speaker recognition schemes and found to be superior.

摘要

出于为下一代“自动发言人识别”(ASR)系统做准备的需求,本文将融合光谱特征与混合机器学习(ML)策略应用于语音通信领域。该策略涉及组合光谱特征,如梅尔频率倒谱系数(MFCCs)、光谱峰度、光谱偏度、归一化基音频率(NPF)和共振峰。所建议的分类方法的特征可能适用于高级说话人识别场景。特别关注了能够通过称为“随机森林 - 支持向量机”(RF - SVM)的说话人身份检测分类器技术找到未知说话人的混合ML方案。提取的说话人精确光谱属性应用于混合RF - SVM分类器以识别/验证特定说话人。这项工作旨在使用混合集成RF - SVM策略在有界区域构建一个具有最小误分类误差的集成决策树。对一系列标准的实时说话人数据库和噪声条件进行功能测试,以验证其与其他现有技术机制相比的性能。与记录的实验数据集的单个方案相比,所提出的融合方法在说话人识别任务中成功实现了高识别率(平均97%)和较低的等错误率(EER)(<2%)。使用标准的ELSDSR、TIMIT和NIST音频数据集验证了分类器的鲁棒性。在ELSDSR、TIMIT和NIST数据集上的实验表明,混合分类器的准确率分别为98%、99%和94%,EER分别为2%、1%和2%。然后将这些结果与其他知名的说话人识别方案进行比较,发现该方法更具优势。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验