Speech, Language and Cognition Laboratory, School of English, University of Hong Kong, Hong Kong.
Department of English and Communication, Hong Kong Polytechnic University, Hong Kong.
Forensic Sci Int. 2024 Oct;363:112199. doi: 10.1016/j.forsciint.2024.112199. Epub 2024 Aug 22.
A growing number of studies in forensic voice comparison have explored how elements of phonetic analysis and automatic speaker recognition systems may be integrated for optimal speaker discrimination performance. However, few studies have investigated the evidential value of long-term speech features using forensically-relevant speech data. This paper reports an empirical validation study that assesses the evidential strength of the following long-term features: fundamental frequency (F0), formant distributions, laryngeal voice quality, mel-frequency cepstral coefficients (MFCCs), and combinations thereof. Non-contemporaneous recordings with speech style mismatch from 75 male Australian English speakers were analyzed. Results show that 1) MFCCs outperform long-term acoustic phonetic features; 2) source and filter features do not provide considerably complementary speaker-specific information; and 3) the addition of long-term phonetic features to an MFCCs-based system does not lead to meaningful improvement in system performance. Implications for the complementarity of phonetic analysis and automatic speaker recognition systems are discussed.
越来越多的法医语音比较研究探索了如何整合语音分析要素和自动说话人识别系统,以实现最佳的说话人区分性能。然而,很少有研究使用与法医学相关的语音数据来调查长期语音特征的证据价值。本文报告了一项实证验证研究,评估了以下长期特征的证据强度:基频 (F0)、共振峰分布、声门嗓音质量、梅尔频率倒谱系数 (MFCCs) 以及它们的组合。对来自 75 名澳大利亚英语男性的非同期录音进行了分析,这些录音的语音风格不匹配。结果表明:1)MFCCs 优于长期声学语音特征;2)源和滤波器特征没有提供相当的补充说话人特异性信息;3)将长期语音特征添加到基于 MFCCs 的系统中不会导致系统性能的显著提高。讨论了语音分析和自动说话人识别系统的互补性的含义。