Netherlands Forensic Institute, PO Box 24044, 2490 AA The Hague, the Netherlands; University of Amsterdam, Science Park 904, 1098 XH Amsterdam, the Netherlands.
Netherlands Forensic Institute, PO Box 24044, 2490 AA The Hague, the Netherlands.
Sci Justice. 2024 Sep;64(5):485-497. doi: 10.1016/j.scijus.2024.07.001. Epub 2024 Jul 9.
Verifying the speaker of a speech fragment can be crucial in attributing a crime to a suspect. The question can be addressed given disputed and reference speech material, adopting the recommended and scientifically accepted likelihood ratio framework for reporting evidential strength in court. In forensic practice, usually, auditory and acoustic analyses are performed to carry out such a verification task considering a diversity of features, such as language competence, pronunciation, or other linguistic features. Automated speaker comparison systems can also be used alongside those manual analyses. State-of-the-art automatic speaker comparison systems are based on deep neural networks that take acoustic features as input. Additional information, though, may be obtained from linguistic analysis. In this paper, we aim to answer if, when and how modern acoustic-based systems can be complemented by an authorship technique based on frequent words, within the likelihood ratio framework. We consider three different approaches to derive a combined likelihood ratio: using a support vector machine algorithm, fitting bivariate normal distributions, and passing the score of the acoustic system as additional input to the frequent-word analysis. We apply our method to the forensically relevant dataset FRIDA and the FISHER corpus, and we explore under which conditions fusion is valuable. We evaluate our results in terms of log likelihood ratio cost (C) and equal error rate (EER). We show that fusion can be beneficial, especially in the case of intercepted phone calls with noise in the background.
验证一段演讲片段的说话人对于将犯罪归因于嫌疑人至关重要。可以采用推荐的和科学接受的证据强度报告似然比框架,在有争议和参考演讲材料的情况下解决这个问题。在法医实践中,通常会进行听觉和声学分析,以考虑语言能力、发音或其他语言特征等多种特征来执行这种验证任务。也可以使用自动化的说话人比较系统来配合这些手动分析。最先进的自动说话人比较系统基于将声学特征作为输入的深度神经网络。尽管如此,可能还可以从语言分析中获取其他信息。在本文中,我们旨在回答在似然比框架内,现代基于声学的系统是否可以、何时以及如何通过基于常用词的作者身份技术进行补充。我们考虑了三种不同的方法来推导出组合似然比:使用支持向量机算法、拟合双变量正态分布以及将声学系统的分数作为附加输入传递到常用词分析。我们将我们的方法应用于法医相关数据集 FRIDA 和 FISHER 语料库,并探索融合在哪些条件下是有价值的。我们根据对数似然比成本 (C) 和等错误率 (EER) 来评估我们的结果。我们表明融合可能是有益的,特别是在背景中有噪音的截获电话的情况下。