• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用双语音和基于深度学习的人工神经网络探索自动说话人识别的性能。

Exploring the performance of automatic speaker recognition using twin speech and deep learning-based artificial neural networks.

作者信息

Cavalcanti Julio Cesar, da Silva Ronaldo Rodrigues, Eriksson Anders, Barbosa Plinio A

机构信息

Laboratory of Phonetics, Department of Linguistics, Stockholm University, Stockholm, Sweden.

Integrated Acoustic Analysis and Cognition Laboratory, Pontifical Catholic University of São Paulo, São Paulo, Brazil.

出版信息

Front Artif Intell. 2024 Feb 8;7:1287877. doi: 10.3389/frai.2024.1287877. eCollection 2024.

DOI:10.3389/frai.2024.1287877
PMID:38405218
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10885345/
Abstract

This study assessed the influence of speaker similarity and sample length on the performance of an automatic speaker recognition (ASR) system utilizing the SpeechBrain toolkit. The dataset comprised recordings from 20 male identical twin speakers engaged in spontaneous dialogues and interviews. Performance evaluations involved comparing identical twins, all speakers in the dataset (including twin pairs), and all speakers excluding twin pairs. Speech samples, ranging from 5 to 30 s, underwent assessment based on equal error rates (EER) and Log cost-likelihood ratios (Cllr). Results highlight the substantial challenge posed by identical twins to the ASR system, leading to a decrease in overall speaker recognition accuracy. Furthermore, analyses based on longer speech samples outperformed those using shorter samples. As sample size increased, standard deviation values for both intra and inter-speaker similarity scores decreased, indicating reduced variability in estimating speaker similarity/dissimilarity levels in longer speech stretches compared to shorter ones. The study also uncovered varying degrees of likeness among identical twins, with certain pairs presenting a greater challenge for ASR systems. These outcomes align with prior research and are discussed within the context of relevant literature.

摘要

本研究评估了说话者相似度和样本长度对利用SpeechBrain工具包的自动说话者识别(ASR)系统性能的影响。数据集包括20对男性同卵双胞胎说话者进行自发对话和访谈的录音。性能评估包括比较同卵双胞胎、数据集中的所有说话者(包括双胞胎对)以及排除双胞胎对后的所有说话者。长度从5秒到30秒的语音样本根据等错误率(EER)和对数成本似然比(Cllr)进行评估。结果突出了同卵双胞胎给ASR系统带来的巨大挑战,导致整体说话者识别准确率下降。此外,基于较长语音样本的分析优于使用较短样本的分析。随着样本量增加,说话者内和说话者间相似度分数的标准差数值下降,表明与较短语音片段相比,较长语音片段在估计说话者相似度/不相似度水平时变异性降低。该研究还发现同卵双胞胎之间存在不同程度的相似性,某些双胞胎对给ASR系统带来了更大挑战。这些结果与先前的研究一致,并在相关文献背景下进行了讨论。

相似文献

1
Exploring the performance of automatic speaker recognition using twin speech and deep learning-based artificial neural networks.利用双语音和基于深度学习的人工神经网络探索自动说话人识别的性能。
Front Artif Intell. 2024 Feb 8;7:1287877. doi: 10.3389/frai.2024.1287877. eCollection 2024.
2
Multi-parametric analysis of speech timing in inter-talker identical twin pairs and cross-pair comparisons: Some forensic implications.多参数分析说话人时间在说话者相同的双胞胎对和交叉对比较中的表现:一些法医学上的启示。
PLoS One. 2022 Jan 21;17(1):e0262800. doi: 10.1371/journal.pone.0262800. eCollection 2022.
3
Euclidean Distances as measures of speaker similarity including identical twin pairs: A forensic investigation using source and filter voice characteristics.作为说话者相似度度量的欧几里得距离,包括同卵双胞胎对:一项使用源和滤波器语音特征的法医调查。
Forensic Sci Int. 2017 Jan;270:25-38. doi: 10.1016/j.forsciint.2016.11.020. Epub 2016 Nov 17.
4
On the speaker discriminatory power asymmetry regarding acoustic-phonetic parameters and the impact of speaking style.关于声学语音参数的说话者辨别能力不对称性及说话风格的影响。
Front Psychol. 2023 Apr 17;14:1101187. doi: 10.3389/fpsyg.2023.1101187. eCollection 2023.
5
Multiparametric Analysis of Speaking Fundamental Frequency in Genetically Related Speakers Using Different Speech Materials: Some Forensic Implications.基于不同语音材料的遗传相关发音者说话基频的多参数分析:一些法医学启示。
J Voice. 2024 Jan;38(1):243.e11-243.e29. doi: 10.1016/j.jvoice.2021.08.013. Epub 2021 Oct 8.
6
A Simplified Vocal Profile Analysis Protocol for the Assessment of Voice Quality and Speaker Similarity.一种用于评估语音质量和说话者相似度的简化语音特征分析方案。
J Voice. 2017 Sep;31(5):644.e11-644.e27. doi: 10.1016/j.jvoice.2017.01.005. Epub 2017 Feb 15.
7
Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech.基于机器学习的方言阿萨姆语语音自动识别样本提取。
Neural Netw. 2016 Jun;78:97-111. doi: 10.1016/j.neunet.2015.12.010. Epub 2015 Dec 30.
8
Fusing linguistic and acoustic information for automated forensic speaker comparison.融合语言和声学信息进行自动法医说话人比较。
Sci Justice. 2024 Sep;64(5):485-497. doi: 10.1016/j.scijus.2024.07.001. Epub 2024 Jul 9.
9
Contrastive Speaker Representation Learning with Hard Negative Sampling for Speaker Recognition.基于硬负例采样的对比说话人表示学习在说话人识别中的应用。
Sensors (Basel). 2024 Sep 25;24(19):6213. doi: 10.3390/s24196213.
10
Modeling speech imitation and ecological learning of auditory-motor maps.建模听觉-运动图谱的言语模仿和生态学习。
Front Psychol. 2013 Jun 27;4:364. doi: 10.3389/fpsyg.2013.00364. Print 2013.

引用本文的文献

1
SHAP-Based Identification of Potential Acoustic Biomarkers in Patients with Post-Thyroidectomy Voice Disorder.基于SHAP的甲状腺切除术后嗓音障碍患者潜在声学生物标志物的识别
Diagnostics (Basel). 2025 Aug 18;15(16):2065. doi: 10.3390/diagnostics15162065.
2
Revisiting the speaker discriminatory power of vowel formant frequencies under a likelihood ratio-based paradigm: The case of mismatched speaking styles.在基于似然比的范式下重新审视元音共振峰频率的说话者辨别能力:不匹配说话风格的情况。
PLoS One. 2024 Dec 10;19(12):e0311363. doi: 10.1371/journal.pone.0311363. eCollection 2024.

本文引用的文献

1
On the speaker discriminatory power asymmetry regarding acoustic-phonetic parameters and the impact of speaking style.关于声学语音参数的说话者辨别能力不对称性及说话风格的影响。
Front Psychol. 2023 Apr 17;14:1101187. doi: 10.3389/fpsyg.2023.1101187. eCollection 2023.
2
Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings.使用深度学习嵌入进行自动法医语音比对中的语言不匹配的影响。
J Forensic Sci. 2023 May;68(3):871-883. doi: 10.1111/1556-4029.15250. Epub 2023 Mar 31.
3
Multi-parametric analysis of speech timing in inter-talker identical twin pairs and cross-pair comparisons: Some forensic implications.
多参数分析说话人时间在说话者相同的双胞胎对和交叉对比较中的表现:一些法医学上的启示。
PLoS One. 2022 Jan 21;17(1):e0262800. doi: 10.1371/journal.pone.0262800. eCollection 2022.
4
Deep Learning-Based Speech Enhancement With a Loss Trading Off the Speech Distortion and the Noise Residue for Cochlear Implants.基于深度学习的人工耳蜗语音增强:一种权衡语音失真与噪声残留的损失函数
Front Med (Lausanne). 2021 Nov 8;8:740123. doi: 10.3389/fmed.2021.740123. eCollection 2021.
5
Multiparametric Analysis of Speaking Fundamental Frequency in Genetically Related Speakers Using Different Speech Materials: Some Forensic Implications.基于不同语音材料的遗传相关发音者说话基频的多参数分析:一些法医学启示。
J Voice. 2024 Jan;38(1):243.e11-243.e29. doi: 10.1016/j.jvoice.2021.08.013. Epub 2021 Oct 8.
6
Acoustic analysis of vowel formant frequencies in genetically-related and non-genetically related speakers with implications for forensic speaker comparison.对具有遗传和非遗传关系的发音人元音共振峰频率的声学分析及其对法庭说话人比较的影响。
PLoS One. 2021 Feb 18;16(2):e0246645. doi: 10.1371/journal.pone.0246645. eCollection 2021.
7
Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition.用于大词汇量自动语音识别的深度脉冲神经网络。
Front Neurosci. 2020 Mar 17;14:199. doi: 10.3389/fnins.2020.00199. eCollection 2020.
8
Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network.基于深度神经网络异构特征统一的语音情感识别
Sensors (Basel). 2019 Jun 18;19(12):2730. doi: 10.3390/s19122730.
9
Deep learning.深度学习。
Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.
10
An empirical estimate of the precision of likelihood ratios from a forensic-voice-comparison system.从法医语音比较系统中得出似然比精度的经验估计。
Forensic Sci Int. 2011 May 20;208(1-3):59-65. doi: 10.1016/j.forsciint.2010.11.001. Epub 2010 Dec 4.