Netherlands Forensic Institute, Laan van Ypenburg 6, 2497 GB The Hague, the Netherlands.
Forensic Sci Int. 2023 Sep;350:111790. doi: 10.1016/j.forsciint.2023.111790. Epub 2023 Jul 20.
Automatic speaker recognition (ASR) is a method used in forensic speaker comparison (FSC) casework. It needs collections of audio data that are representative of the case audio in order to perform reference normalization and to train a score-to-LR function. Audio from a certain minimum number of speakers is needed for each of those purposes to obtain relatively stable performance of ASR. Although it is not possible to set a hard cut-off, for the purpose of this work this number was chosen to be 30 for each, and 60 for both. Lack of representative data from that many speakers and uncertainty about what exactly constitutes representative data are major reasons for not employing ASR in FSC. An experiment was carried out in which a situation was simulated where a practitioner has only 30 speakers available. Several data strategies are tried out to handle the lack of data: leaving out reference normalization, splitting the 30 speakers into two groups of 15 (ignoring the minimum of 30) and a leave 1 or 2 out strategy where all 30 speakers are used for both reference normalization and calibration. They are compared to the baseline situation where the practitioner does have the required 60 speakers. The leave 1 or 2 out strategy with 30 speakers performs on par with baseline, and extension of that strategy to the full 60 speakers even outperforms baseline. This shows that a strategy that halves the data need is viable, lessening the data requirements for ASR in FSC and making the use of ASR possible in more cases.
自动说话人识别 (ASR) 是法庭科学说话人比较 (FSC) 工作中使用的一种方法。为了执行参考归一化和训练分数到 LR 函数,它需要收集具有代表性的音频数据。为了实现 ASR 相对稳定的性能,每个目的都需要来自一定数量的说话者的音频。虽然不可能设置硬性截止值,但为了本工作的目的,这个数量被选为每个目的 30,两个目的共 60。缺乏来自那么多说话者的代表性数据以及不确定什么是代表性数据,是 FSC 中不采用 ASR 的主要原因。进行了一项实验,模拟了从业人员只有 30 个说话者可用的情况。尝试了几种数据策略来处理数据不足的问题:省略参考归一化,将 30 个说话者分成两组,每组 15 个(忽略最小数量 30 个),以及采用 leave 1 或 2 out 策略,其中所有 30 个说话者都用于参考归一化和校准。将它们与从业人员确实有需要的 60 个说话者的基线情况进行比较。使用 30 个说话者的 leave 1 或 2 out 策略与基线表现相当,并且将该策略扩展到全部 60 个说话者甚至超过基线。这表明减半数据需求的策略是可行的,可以减少 FSC 中 ASR 的数据需求,并使 ASR 在更多情况下成为可能。