Department of Security and Crime Science, University College London, London, United Kingdom.
Department of Computer Science, University College London, London, United Kingdom.
PLoS One. 2023 Aug 2;18(8):e0285333. doi: 10.1371/journal.pone.0285333. eCollection 2023.
Speech deepfakes are artificial voices generated by machine learning models. Previous literature has highlighted deepfakes as one of the biggest security threats arising from progress in artificial intelligence due to their potential for misuse. However, studies investigating human detection capabilities are limited. We presented genuine and deepfake audio to n = 529 individuals and asked them to identify the deepfakes. We ran our experiments in English and Mandarin to understand if language affects detection performance and decision-making rationale. We found that detection capability is unreliable. Listeners only correctly spotted the deepfakes 73% of the time, and there was no difference in detectability between the two languages. Increasing listener awareness by providing examples of speech deepfakes only improves results slightly. As speech synthesis algorithms improve and become more realistic, we can expect the detection task to become harder. The difficulty of detecting speech deepfakes confirms their potential for misuse and signals that defenses against this threat are needed.
语音深度伪造是由机器学习模型生成的人工语音。之前的文献强调,由于其潜在的滥用风险,深度伪造是人工智能进步带来的最大安全威胁之一。然而,研究人员对人类检测能力的研究还很有限。我们向 n=529 人展示了真实的和深度伪造的音频,并要求他们识别深度伪造的音频。我们在英语和普通话中进行了实验,以了解语言是否会影响检测性能和决策依据。我们发现,检测能力是不可靠的。听众只能在 73%的情况下正确识别出深度伪造的音频,而且两种语言的可检测性没有差异。通过提供语音深度伪造的示例来提高听众的意识,只能略微提高结果。随着语音合成算法的改进和变得更加逼真,我们可以预期检测任务会变得更加困难。检测语音深度伪造的难度证实了它们潜在的滥用风险,并表明需要针对这种威胁采取防御措施。