• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估语音识别解决方案在电子商务应用中的性能。

Evaluating the Performance of Speaker Recognition Solutions in E-Commerce Applications.

机构信息

Department for Information Technology, Faculty of Organizational Sciences, University of Belgrade, 11000 Belgrade, Serbia.

出版信息

Sensors (Basel). 2021 Sep 17;21(18):6231. doi: 10.3390/s21186231.

DOI:10.3390/s21186231
PMID:34577440
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8473232/
Abstract

Two important tasks in many e-commerce applications are identity verification of the user accessing the system and determining the level of rights that the user has for accessing and manipulating system's resources. The performance of these tasks is directly dependent on the certainty of establishing the identity of the user. The main research focus of this paper is user identity verification approach based on voice recognition techniques. The paper presents research results connected to the usage of open-source speaker recognition technologies in e-commerce applications with an emphasis on evaluating the performance of the algorithms they use. Four open-source speaker recognition solutions (SPEAR, MARF, ALIZE, and HTK) have been evaluated in cases of mismatched conditions during training and recognition phases. In practice, mismatched conditions are influenced by various lengths of spoken sentences, different types of recording devices, and the usage of different languages in training and recognition phases. All tests conducted in this research were performed in laboratory conditions using the specially designed framework for multimodal biometrics. The obtained results show consistency with the findings of recent research which proves that i-vectors and solutions based on probabilistic linear discriminant analysis (PLDA) continue to be the dominant speaker recognition approaches for text-independent tasks.

摘要

在许多电子商务应用中,有两个重要任务,分别是验证访问系统的用户身份和确定用户访问和操作系统资源的权限级别。这些任务的执行直接取决于确定用户身份的确定性。本文的主要研究重点是基于语音识别技术的用户身份验证方法。本文介绍了与在电子商务应用中使用开源说话人识别技术相关的研究结果,重点评估了它们所使用的算法的性能。在训练和识别阶段存在不匹配条件的情况下,评估了四个开源说话人识别解决方案(SPEAR、MARF、ALIZE 和 HTK)。在实践中,不匹配条件会受到各种长度的语音句子、不同类型的录音设备以及训练和识别阶段中使用不同语言的影响。本研究中进行的所有测试都是在实验室条件下使用专门设计的多模态生物识别框架进行的。所得结果与最近的研究结果一致,证明了 i-向量和基于概率线性判别分析(PLDA)的解决方案仍然是文本无关任务中占主导地位的说话人识别方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/809c/8473232/a56d549eac1b/sensors-21-06231-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/809c/8473232/7640a1c96b67/sensors-21-06231-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/809c/8473232/698f1ec3cf85/sensors-21-06231-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/809c/8473232/201c686532e8/sensors-21-06231-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/809c/8473232/a56d549eac1b/sensors-21-06231-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/809c/8473232/7640a1c96b67/sensors-21-06231-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/809c/8473232/698f1ec3cf85/sensors-21-06231-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/809c/8473232/201c686532e8/sensors-21-06231-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/809c/8473232/a56d549eac1b/sensors-21-06231-g004.jpg

相似文献

1
Evaluating the Performance of Speaker Recognition Solutions in E-Commerce Applications.评估语音识别解决方案在电子商务应用中的性能。
Sensors (Basel). 2021 Sep 17;21(18):6231. doi: 10.3390/s21186231.
2
Cost-sensitive learning for emotion robust speaker recognition.用于情感鲁棒性说话人识别的代价敏感学习
ScientificWorldJournal. 2014;2014:628516. doi: 10.1155/2014/628516. Epub 2014 Jun 4.
3
Automatic Speaker Recognition System Based on Gaussian Mixture Models, Cepstral Analysis, and Genetic Selection of Distinctive Features.基于高斯混合模型、倒谱分析和遗传选择独特特征的自动说话人识别系统。
Sensors (Basel). 2022 Dec 1;22(23):9370. doi: 10.3390/s22239370.
4
The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications.不匹配的录音条件对法医应用中人类和自动说话人识别的影响。
Forensic Sci Int. 2004 Dec 2;146 Suppl:S95-9. doi: 10.1016/j.forsciint.2004.09.078.
5
A scalable formulation of probabilistic linear discriminant analysis: applied to face recognition.概率线性判别分析的可扩展公式:在人脸识别中的应用。
IEEE Trans Pattern Anal Mach Intell. 2013 Jul;35(7):1788-94. doi: 10.1109/TPAMI.2013.38.
6
Development of Supervised Speaker Diarization System Based on the PyAnnote Audio Processing Library.基于 PyAnnote 音频处理库的监督式说话人标注系统的开发。
Sensors (Basel). 2023 Feb 13;23(4):2082. doi: 10.3390/s23042082.
7
Phonetic variability constrained bottleneck features for joint speaker recognition and physical task stress detection.用于联合说话人识别和身体任务压力检测的语音变异受限瓶颈特征
J Acoust Soc Am. 2020 Nov;148(5):2912. doi: 10.1121/10.0002455.
8
Influence of emotional prosody, content, and repetition on memory recognition of speaker identity.情绪韵律、内容和重复对说话人身份记忆识别的影响。
Q J Exp Psychol (Hove). 2021 Jul;74(7):1185-1201. doi: 10.1177/1747021821998557. Epub 2021 Mar 17.
9
Text-independent speaker verification using Minimal Resource Allocation Networks.使用最小资源分配网络的文本无关说话人验证
Int J Neural Syst. 2004 Dec;14(6):347-54. doi: 10.1142/S0129065704002108.
10
Discriminative analysis of lip motion features for speaker identification and speech-reading.用于说话人识别和语音阅读的唇部运动特征判别分析。
IEEE Trans Image Process. 2006 Oct;15(10):2879-91. doi: 10.1109/tip.2006.877528.

引用本文的文献

1
Attention-Based Temporal-Frequency Aggregation for Speaker Verification.基于注意力的时频聚合在说话人验证中的应用。
Sensors (Basel). 2022 Mar 10;22(6):2147. doi: 10.3390/s22062147.