利用双语音和基于深度学习的人工神经网络探索自动说话人识别的性能。

Exploring the performance of automatic speaker recognition using twin speech and deep learning-based artificial neural networks.

作者信息

Cavalcanti Julio Cesar, da Silva Ronaldo Rodrigues, Eriksson Anders, Barbosa Plinio A

机构信息

Laboratory of Phonetics, Department of Linguistics, Stockholm University, Stockholm, Sweden.

Integrated Acoustic Analysis and Cognition Laboratory, Pontifical Catholic University of São Paulo, São Paulo, Brazil.

出版信息

Front Artif Intell. 2024 Feb 8;7:1287877. doi: 10.3389/frai.2024.1287877. eCollection 2024.

DOI:10.3389/frai.2024.1287877

PMID:38405218

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10885345/

Abstract

This study assessed the influence of speaker similarity and sample length on the performance of an automatic speaker recognition (ASR) system utilizing the SpeechBrain toolkit. The dataset comprised recordings from 20 male identical twin speakers engaged in spontaneous dialogues and interviews. Performance evaluations involved comparing identical twins, all speakers in the dataset (including twin pairs), and all speakers excluding twin pairs. Speech samples, ranging from 5 to 30 s, underwent assessment based on equal error rates (EER) and Log cost-likelihood ratios (Cllr). Results highlight the substantial challenge posed by identical twins to the ASR system, leading to a decrease in overall speaker recognition accuracy. Furthermore, analyses based on longer speech samples outperformed those using shorter samples. As sample size increased, standard deviation values for both intra and inter-speaker similarity scores decreased, indicating reduced variability in estimating speaker similarity/dissimilarity levels in longer speech stretches compared to shorter ones. The study also uncovered varying degrees of likeness among identical twins, with certain pairs presenting a greater challenge for ASR systems. These outcomes align with prior research and are discussed within the context of relevant literature.

摘要

本研究评估了说话者相似度和样本长度对利用SpeechBrain工具包的自动说话者识别（ASR）系统性能的影响。数据集包括20对男性同卵双胞胎说话者进行自发对话和访谈的录音。性能评估包括比较同卵双胞胎、数据集中的所有说话者（包括双胞胎对）以及排除双胞胎对后的所有说话者。长度从5秒到30秒的语音样本根据等错误率（EER）和对数成本似然比（Cllr）进行评估。结果突出了同卵双胞胎给ASR系统带来的巨大挑战，导致整体说话者识别准确率下降。此外，基于较长语音样本的分析优于使用较短样本的分析。随着样本量增加，说话者内和说话者间相似度分数的标准差数值下降，表明与较短语音片段相比，较长语音片段在估计说话者相似度/不相似度水平时变异性降低。该研究还发现同卵双胞胎之间存在不同程度的相似性，某些双胞胎对给ASR系统带来了更大挑战。这些结果与先前的研究一致，并在相关文献背景下进行了讨论。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用双语音和基于深度学习的人工神经网络探索自动说话人识别的性能。

Exploring the performance of automatic speaker recognition using twin speech and deep learning-based artificial neural networks.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

利用双语音和基于深度学习的人工神经网络探索自动说话人识别的性能。

Exploring the performance of automatic speaker recognition using twin speech and deep learning-based artificial neural networks.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献