• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于说话人识别的 Lombard 效应和低语的分析与校准

Analysis and Calibration of Lombard Effect and Whisper for Speaker Recognition.

作者信息

Kelly Finnian, Hansen John H L

机构信息

Center for Robust Speech Systems (CRSS), University of Texas at Dallas, Richardson, TX 75083-0688 USA.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2021;29:927-942. doi: 10.1109/taslp.2021.3053388. Epub 2021 Jan 21.

DOI:10.1109/taslp.2021.3053388
PMID:35783572
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9245507/
Abstract

Variations in vocal effort can create challenges for speaker recognition systems that are optimized for use with neutral speech. The Lombard effect and whisper are two commonly-occurring forms of vocal effort variation that result in non-neutral speech, the first due to noise exposure and the second due to intentional adjustment on the part of the speaker. In this article, a comparative evaluation of speaker recognition performance in non-neutral conditions is presented using multiple Lombard effect and whisper corpora. The detrimental impact of these vocal effort variations on discrimination and calibration performance on global, per-corpus, and per-speaker levels is explored using conventional error metrics, along with visual representations of the model and score spaces. A non-neutral speech detector is subsequently introduced and used to inform score calibration in several ways. Two calibration approaches are proposed and shown to reduce error to the same level as an optimal calibration approach that relies on ground-truth vocal effort information. This article contributes a generalizable methodology towards detecting vocal effort variation and using this knowledge to inform and advance speaker recognition system behavior.

摘要

发声力度的变化会给针对中性语音优化的说话人识别系统带来挑战。伦巴德效应和低语是两种常见的发声力度变化形式,会导致非中性语音,第一种是由于噪声暴露,第二种是由于说话者的有意调整。在本文中,使用多个伦巴德效应和低语语料库对非中性条件下的说话人识别性能进行了比较评估。使用传统误差指标以及模型和分数空间的可视化表示,探讨了这些发声力度变化对全局、每个语料库和每个说话人层面的辨别和校准性能的不利影响。随后引入了一个非中性语音检测器,并以多种方式用于指导分数校准。提出了两种校准方法,并证明它们可以将误差降低到与依赖真实发声力度信息的最优校准方法相同的水平。本文贡献了一种可推广的方法,用于检测发声力度变化,并利用这些知识来指导和改进说话人识别系统的行为。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5ef/9245507/f67ffa9833b3/nihms-1818194-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5ef/9245507/42906444aa5a/nihms-1818194-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5ef/9245507/b4a2237b6ee6/nihms-1818194-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5ef/9245507/970b5dc41037/nihms-1818194-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5ef/9245507/f67ffa9833b3/nihms-1818194-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5ef/9245507/42906444aa5a/nihms-1818194-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5ef/9245507/b4a2237b6ee6/nihms-1818194-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5ef/9245507/970b5dc41037/nihms-1818194-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5ef/9245507/f67ffa9833b3/nihms-1818194-f0004.jpg

相似文献

1
Analysis and Calibration of Lombard Effect and Whisper for Speaker Recognition.用于说话人识别的 Lombard 效应和低语的分析与校准
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:927-942. doi: 10.1109/taslp.2021.3053388. Epub 2021 Jan 21.
2
Detecting Lombard Speech Using Deep Learning Approach.使用深度学习方法检测 Lombard 语音。
Sensors (Basel). 2022 Dec 28;23(1):315. doi: 10.3390/s23010315.
3
Real-time multilingual speech recognition and speaker diarization system based on Whisper segmentation.基于Whisper分割的实时多语言语音识别与说话人识别系统。
PeerJ Comput Sci. 2024 Mar 29;10:e1973. doi: 10.7717/peerj-cs.1973. eCollection 2024.
4
Analysis of human scream and its impact on text-independent speaker verification.人类尖叫分析及其对文本无关说话人验证的影响。
J Acoust Soc Am. 2017 Apr;141(4):2957. doi: 10.1121/1.4979337.
5
The Lombard reflex and its role on human listeners and automatic speech recognizers.伦巴德反射及其在人类听众和自动语音识别器上的作用。
J Acoust Soc Am. 1993 Jan;93(1):510-24. doi: 10.1121/1.405631.
6
Phonetic variability constrained bottleneck features for joint speaker recognition and physical task stress detection.用于联合说话人识别和身体任务压力检测的语音变异受限瓶颈特征
J Acoust Soc Am. 2020 Nov;148(5):2912. doi: 10.1121/10.0002455.
7
The Lombard effect associated with Chinese male alaryngeal speech.与中国男性无喉语音相关的伦巴德效应。
Int J Speech Lang Pathol. 2019 Dec;21(6):584-592. doi: 10.1080/17549507.2018.1551932. Epub 2019 Feb 7.
8
Lombard Effect in Individuals With Nonphonotraumatic Vocal Hyperfunction: Impact on Acoustic, Aerodynamic, and Vocal Fold Vibratory Parameters.非语音创伤性发声功能亢进患者的 Lombard 效应:对声学、空气动力学和声带振动参数的影响。
J Speech Lang Hear Res. 2022 Aug 17;65(8):2881-2895. doi: 10.1044/2022_JSLHR-21-00508. Epub 2022 Aug 5.
9
The Lombard effect and other noise-induced vocal modifications: insight from mammalian communication systems.伦巴第效应和其他噪声诱导的发声变化:来自哺乳动物通讯系统的研究。
Biol Rev Camb Philos Soc. 2013 Nov;88(4):809-24. doi: 10.1111/brv.12026. Epub 2013 Feb 26.
10
How the human brain recognizes speech in the context of changing speakers.人类大脑如何在说话人变化的情况下识别语音。
J Neurosci. 2010 Jan 13;30(2):629-38. doi: 10.1523/JNEUROSCI.2742-09.2010.

引用本文的文献

1
Brain Dynamics of Speech Modes Encoding: Loud and Whispered Speech Versus Standard Speech.语音模式编码的脑动力学:大声和低语语音与标准语音对比
Brain Topogr. 2025 Feb 15;38(2):31. doi: 10.1007/s10548-025-01108-z.
2
Detecting Lombard Speech Using Deep Learning Approach.使用深度学习方法检测 Lombard 语音。
Sensors (Basel). 2022 Dec 28;23(1):315. doi: 10.3390/s23010315.
3
The effects of Lombard perturbation on speech intelligibility in noise for normal hearing and cochlear implant listeners.隆巴德干扰对正常听力和人工耳蜗听众在噪声环境中言语可懂度的影响。
J Acoust Soc Am. 2022 Feb;151(2):1007. doi: 10.1121/10.0009377.

本文引用的文献

1
Two Decades of Speaker Recognition Evaluation at the National Institute of Standards and Technology.美国国家标准与技术研究院二十年的说话人识别评估
Comput Speech Lang. 2020 Mar;60. doi: 10.1016/j.csl.2019.101032.
2
Conversational speech levels and signal-to-noise ratios in realistic acoustic conditions.会话语音水平和实际声学条件下的信噪比。
J Acoust Soc Am. 2019 Jan;145(1):349. doi: 10.1121/1.5087567.
3
A corpus of audio-visual Lombard speech with frontal and profile views.带有正面和侧面视图的视听伦巴第语语料库。
J Acoust Soc Am. 2018 Jun;143(6):EL523. doi: 10.1121/1.5042758.
4
Analysis of human scream and its impact on text-independent speaker verification.人类尖叫分析及其对文本无关说话人验证的影响。
J Acoust Soc Am. 2017 Apr;141(4):2957. doi: 10.1121/1.4979337.
5
The Lombard reflex and its role on human listeners and automatic speech recognizers.伦巴德反射及其在人类听众和自动语音识别器上的作用。
J Acoust Soc Am. 1993 Jan;93(1):510-24. doi: 10.1121/1.405631.