• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用深度学习嵌入进行自动法医语音比对中的语言不匹配的影响。

Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings.

机构信息

Faculty of Electrical Engineering and Informatics, Department of Telecommunication and Media Informatics, Budapest University of Technology and Economics, Magyar tudósok körútja 2, Budapest, 1117, Hungary.

Doctoral School of Law Enforcement, Hungarian National University of Public Service, H-1083 Budapest, 2 Ludovika tér, Budapest, H-1441, Hungary.

出版信息

J Forensic Sci. 2023 May;68(3):871-883. doi: 10.1111/1556-4029.15250. Epub 2023 Mar 31.

DOI:10.1111/1556-4029.15250
PMID:36999742
Abstract

In forensic voice comparison, deep learning has become widely popular recently. It is mainly used to learn speaker representations, called embeddings or embedding vectors. Speaker embeddings are often trained using corpora mostly containing widely spoken languages. Thus, language dependency is an important factor in automatic forensic voice comparison, especially when the target language is linguistically very different from that the model is trained on. In the case of a low-resource language, developing a corpus for forensic purposes containing enough speakers to train deep learning models is costly. This study aims to investigate whether a model pre-trained on multilingual (mostly English) corpus can be used on a target low-resource language (here, Hungarian), not represented by the model. Often multiple samples are not available from the offender (unknown speaker). Samples are therefore compared pairwise with and without speaker enrollment for suspect (known) speakers. Two corpora are used that were developed especially for forensic purposes and a third that is meant for traditional speaker verification. Speaker embedding vectors are extracted by the x-vector and ECAPA-TDNN techniques. Speaker verification was evaluated in the likelihood-ratio framework. A comparison is made between the language combinations (modeling, LR calibration, and evaluation). The results were evaluated by Cllr and EER metrics. It was found that the model pre-trained on a different language but on a corpus with a significant number of speakers can be used on samples with language mismatch. Sample duration and speaking style also seem to affect the performance.

摘要

在法医语音比较中,深度学习最近变得非常流行。它主要用于学习说话人表示,称为嵌入或嵌入向量。说话人嵌入通常使用主要包含广泛使用的语言的语料库进行训练。因此,语言依赖性是自动法医语音比较的一个重要因素,尤其是当目标语言与模型所训练的语言在语言学上非常不同时。在资源匮乏的语言情况下,为法医目的开发包含足够说话人来训练深度学习模型的语料库是昂贵的。本研究旨在调查在多语言(主要是英语)语料库上预训练的模型是否可以用于目标低资源语言(此处为匈牙利语),而模型中没有表示该语言的内容。通常情况下,犯罪者(未知说话人)没有多个样本。因此,使用和不使用说话人注册来比较嫌疑说话人(已知说话人)的样本。使用了专门为法医目的开发的两个语料库和一个用于传统说话人验证的语料库。通过 x-vector 和 ECAPA-TDNN 技术提取说话人嵌入向量。在似然比框架中评估说话人验证。比较了语言组合(建模、LR 校准和评估)。通过 Cllr 和 EER 指标评估结果。结果发现,在不同语言但在包含大量说话人的语料库上预训练的模型可以用于语言不匹配的样本。样本持续时间和说话风格似乎也会影响性能。

相似文献

1
Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings.使用深度学习嵌入进行自动法医语音比对中的语言不匹配的影响。
J Forensic Sci. 2023 May;68(3):871-883. doi: 10.1111/1556-4029.15250. Epub 2023 Mar 31.
2
Detecting Vocal Fatigue with Neural Embeddings.利用神经嵌入检测嗓音疲劳。
J Voice. 2023 Feb 9. doi: 10.1016/j.jvoice.2023.01.012.
3
Speaker identification in courtroom contexts - Part I: Individual listeners compared to forensic voice comparison based on automatic-speaker-recognition technology.法庭环境中的说话人识别 - 第一部分:个体听众与基于自动说话人识别技术的法庭语音比对比较。
Forensic Sci Int. 2022 Dec;341:111499. doi: 10.1016/j.forsciint.2022.111499. Epub 2022 Oct 15.
4
Data strategies in forensic automatic speaker comparison.法庭自动语音比对中的数据策略。
Forensic Sci Int. 2023 Sep;350:111790. doi: 10.1016/j.forsciint.2023.111790. Epub 2023 Jul 20.
5
Multisensory Fusion for Unsupervised Spatiotemporal Speaker Diarization.用于无监督时空说话人分离的多感官融合
Sensors (Basel). 2024 Jun 29;24(13):4229. doi: 10.3390/s24134229.
6
Few-shot short utterance speaker verification using meta-learning.基于元学习的少样本短语音说话人验证
PeerJ Comput Sci. 2023 Apr 21;9:e1276. doi: 10.7717/peerj-cs.1276. eCollection 2023.
7
The impact in forensic voice comparison of lack of calibration and of mismatched conditions between the known-speaker recording and the relevant-population sample recordings.已知说话者录音与相关人群样本录音之间缺乏校准以及条件不匹配对法医语音比较的影响。
Forensic Sci Int. 2018 Feb;283:e1-e7. doi: 10.1016/j.forsciint.2017.12.024. Epub 2017 Dec 19.
8
Ensemble learning with speaker embeddings in multiple speech task stimuli for depression detection.在用于抑郁症检测的多语音任务刺激中结合说话人嵌入的集成学习。
Front Neurosci. 2023 Mar 23;17:1141621. doi: 10.3389/fnins.2023.1141621. eCollection 2023.
9
Combination of deep speaker embeddings for diarisation.用于语音分离的深度说话人嵌入组合
Neural Netw. 2021 Sep;141:372-384. doi: 10.1016/j.neunet.2021.04.020. Epub 2021 Apr 21.
10
Validations of an alpha version of the E Forensic Speech Science System (EFS) core software tools.电子法医语音科学系统(EFS)核心软件工具alpha版本的验证。
Forensic Sci Int Synerg. 2022 Mar 7;4:100223. doi: 10.1016/j.fsisyn.2022.100223. eCollection 2022.

引用本文的文献

1
Exploring the performance of automatic speaker recognition using twin speech and deep learning-based artificial neural networks.利用双语音和基于深度学习的人工神经网络探索自动说话人识别的性能。
Front Artif Intell. 2024 Feb 8;7:1287877. doi: 10.3389/frai.2024.1287877. eCollection 2024.