Suppr超能文献

在基于似然比的范式下重新审视元音共振峰频率的说话者辨别能力:不匹配说话风格的情况。

Revisiting the speaker discriminatory power of vowel formant frequencies under a likelihood ratio-based paradigm: The case of mismatched speaking styles.

作者信息

Cavalcanti Julio Cesar, Eriksson Anders, Barbosa Plinio A, Madureira Sandra

机构信息

Department of Linguistics, Stockholm University, Stockholm, Sweden.

Applied Linguistics and Language Studies Graduate Program, Pontifical Catholic University of São Paulo, São Paulo, Brazil.

出版信息

PLoS One. 2024 Dec 10;19(12):e0311363. doi: 10.1371/journal.pone.0311363. eCollection 2024.

Abstract

Differentiating subjects through the comparison of their recorded speech is a common endeavor in speaker characterization. When using an acoustic-based approach, this task typically involves scrutinizing specific acoustic parameters and assessing their discriminatory capacity. This experimental study aimed to evaluate the speaker discriminatory power of vowel formants-resonance peaks in the vocal tract-in two different speaking styles: Dialogue and Interview. Different testing procedures were applied, specifically metrics compatible with the likelihood ratio paradigm. Only high-quality recordings were analyzed in this study. The participants were 20 male Brazilian Portuguese (BP) speakers from the same dialectal area. Two speaker-discriminatory power estimates were examined through Multivariate Kernel Density analysis: Log cost-likelihood ratios (Cllr) and equal error rates (EER). As expected, the discriminatory performance was stronger for style-matched analyses than for mismatched-style analyses. In order of relevance, F3, F4, and F1 performed the best in style-matched comparisons, as suggested by lower Cllr and EER values. F2 performed the worst intra-style in both Dialogue and Interview. The discriminatory power of all individual formants (F1-F4) appeared to be affected in the mismatched condition, demonstrating that discriminatory power is sensitive to style-driven changes in speech production. The combination of higher formants 'F3 + F4' outperformed the combination of lower formants 'F1 + F2'. However, in mismatched-style analyses, the magnitude of improvement in Cllr and EER scores increased as more formants were incorporated into the model. The best discriminatory performance was achieved when most formants were combined. Applying multivariate analysis not only reduced average Cllr and EER scores but also influenced the overall probability distribution, shifting the probability density distribution towards lower Cllr and EER values. In general, front and central vowels were found more speaker discriminatory than back vowels as far as the 'F1 + F2' relation was concerned.

摘要

通过比较受试者的录音语音来区分个体是说话者特征描述中的一项常见工作。在使用基于声学的方法时,这项任务通常涉及仔细检查特定的声学参数并评估它们的辨别能力。本实验研究旨在评估元音共振峰(声道中的共振峰值)在两种不同说话风格(对话和访谈)中的说话者辨别能力。应用了不同的测试程序,具体是与似然比范式兼容的指标。本研究仅分析了高质量的录音。参与者是来自同一方言地区的20名巴西葡萄牙语(BP)男性说话者。通过多变量核密度分析检查了两种说话者辨别能力估计值:对数成本似然比(Cllr)和等错误率(EER)。正如预期的那样,风格匹配分析的辨别性能比不匹配风格分析更强。按照相关性顺序,在风格匹配比较中,F3、F4和F1表现最佳,较低的Cllr和EER值表明了这一点。在对话和访谈中,F2在同一样式内表现最差。在不匹配条件下,所有单个共振峰(F1 - F4)的辨别能力似乎都受到了影响,这表明辨别能力对语音产生中风格驱动的变化很敏感。较高共振峰“F3 + F4”的组合优于较低共振峰“F1 + F2”的组合。然而,在不匹配风格分析中,随着更多共振峰被纳入模型,Cllr和EER分数的改善幅度会增加。当大多数共振峰组合在一起时,实现了最佳的辨别性能。应用多变量分析不仅降低了平均Cllr和EER分数,还影响了总体概率分布,将概率密度分布向较低的Cllr和EER值转移。一般来说,就“F1 + F2”关系而言,前元音和央元音比后元音具有更强的说话者辨别能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f813/11630611/11e22b62ba12/pone.0311363.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验