Khanjani Zahra, Watson Gabrielle, Janeja Vandana P
Department of Information System, University of Maryland Baltimore County, Baltimore, MD, United States.
Front Big Data. 2023 Jan 9;5:1001063. doi: 10.3389/fdata.2022.1001063. eCollection 2022.
A deepfake is content or material that is synthetically generated or manipulated using artificial intelligence (AI) methods, to be passed off as real and can include audio, video, image, and text synthesis. The key difference between manual editing and deepfakes is that deepfakes are AI generated or AI manipulated and closely resemble authentic artifacts. In some cases, deepfakes can be fabricated using AI-generated content in its entirety. Deepfakes have started to have a major impact on society with more generation mechanisms emerging everyday. This article makes a contribution in understanding the landscape of deepfakes, and their detection and generation methods. We evaluate various categories of deepfakes especially in audio. The purpose of this survey is to provide readers with a deeper understanding of (1) different deepfake categories; (2) how they could be created and detected; (3) more specifically, how audio deepfakes are created and detected in more detail, which is the main focus of this paper. We found that generative adversarial networks (GANs), convolutional neural networks (CNNs), and deep neural networks (DNNs) are common ways of creating and detecting deepfakes. In our evaluation of over 150 methods, we found that the majority of the focus is on video deepfakes, and, in particular, the generation of video deepfakes. We found that for text deepfakes, there are more generation methods but very few robust methods for detection, including fake news detection, which has become a controversial area of research because of the potential heavy overlaps with human generation of fake content. Our study reveals a clear need to research audio deepfakes and particularly detection of audio deepfakes. This survey has been conducted with a different perspective, compared to existing survey papers that mostly focus on just video and image deepfakes. This survey mainly focuses on audio deepfakes that are overlooked in most of the existing surveys. This article's most important contribution is to critically analyze and provide a unique source of audio deepfake research, mostly ranging from 2016 to 2021. To the best of our knowledge, this is the first survey focusing on audio deepfakes generation and detection in English.
深度伪造是指使用人工智能(AI)方法合成生成或操纵的内容或材料,旨在冒充真实内容,可包括音频、视频、图像和文本合成。人工编辑与深度伪造之间的关键区别在于,深度伪造是由人工智能生成或操纵的,与真实的制品非常相似。在某些情况下,深度伪造可以完全使用人工智能生成的内容来制作。随着每天都有更多的生成机制出现,深度伪造已开始对社会产生重大影响。本文有助于理解深度伪造的概况及其检测和生成方法。我们评估了各类深度伪造,尤其是音频方面的。本次调查的目的是让读者更深入地了解:(1)不同的深度伪造类别;(2)它们是如何创建和检测的;(3)更具体地说,音频深度伪造是如何创建和检测的,这是本文的主要重点。我们发现生成对抗网络(GAN)、卷积神经网络(CNN)和深度神经网络(DNN)是创建和检测深度伪造的常见方法。在我们对150多种方法的评估中,我们发现大多数研究集中在视频深度伪造上,尤其是视频深度伪造的生成。我们发现,对于文本深度伪造,有更多的生成方法,但用于检测的可靠方法却很少,包括假新闻检测,由于其与人类生成的虚假内容可能存在大量重叠,这已成为一个有争议的研究领域。我们的研究表明,显然需要对音频深度伪造进行研究,尤其是音频深度伪造的检测。与现有的主要关注视频和图像深度伪造的调查论文相比,本次调查是从不同的角度进行的。本次调查主要关注在大多数现有调查中被忽视的音频深度伪造。本文最重要的贡献是批判性地分析并提供了一个独特的音频深度伪造研究资源,大部分研究时间跨度从2016年到2021年。据我们所知,这是第一篇以英文撰写的专注于音频深度伪造生成和检测的调查。