音频深度伪造：一项调查。

Audio deepfakes: A survey.

作者信息

Khanjani Zahra, Watson Gabrielle, Janeja Vandana P

机构信息

Department of Information System, University of Maryland Baltimore County, Baltimore, MD, United States.

出版信息

Front Big Data. 2023 Jan 9;5:1001063. doi: 10.3389/fdata.2022.1001063. eCollection 2022.

DOI:10.3389/fdata.2022.1001063

PMID:36700137

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9869423/

Abstract

A deepfake is content or material that is synthetically generated or manipulated using artificial intelligence (AI) methods, to be passed off as real and can include audio, video, image, and text synthesis. The key difference between manual editing and deepfakes is that deepfakes are AI generated or AI manipulated and closely resemble authentic artifacts. In some cases, deepfakes can be fabricated using AI-generated content in its entirety. Deepfakes have started to have a major impact on society with more generation mechanisms emerging everyday. This article makes a contribution in understanding the landscape of deepfakes, and their detection and generation methods. We evaluate various categories of deepfakes especially in audio. The purpose of this survey is to provide readers with a deeper understanding of (1) different deepfake categories; (2) how they could be created and detected; (3) more specifically, how audio deepfakes are created and detected in more detail, which is the main focus of this paper. We found that generative adversarial networks (GANs), convolutional neural networks (CNNs), and deep neural networks (DNNs) are common ways of creating and detecting deepfakes. In our evaluation of over 150 methods, we found that the majority of the focus is on video deepfakes, and, in particular, the generation of video deepfakes. We found that for text deepfakes, there are more generation methods but very few robust methods for detection, including fake news detection, which has become a controversial area of research because of the potential heavy overlaps with human generation of fake content. Our study reveals a clear need to research audio deepfakes and particularly detection of audio deepfakes. This survey has been conducted with a different perspective, compared to existing survey papers that mostly focus on just video and image deepfakes. This survey mainly focuses on audio deepfakes that are overlooked in most of the existing surveys. This article's most important contribution is to critically analyze and provide a unique source of audio deepfake research, mostly ranging from 2016 to 2021. To the best of our knowledge, this is the first survey focusing on audio deepfakes generation and detection in English.

摘要

深度伪造是指使用人工智能（AI）方法合成生成或操纵的内容或材料，旨在冒充真实内容，可包括音频、视频、图像和文本合成。人工编辑与深度伪造之间的关键区别在于，深度伪造是由人工智能生成或操纵的，与真实的制品非常相似。在某些情况下，深度伪造可以完全使用人工智能生成的内容来制作。随着每天都有更多的生成机制出现，深度伪造已开始对社会产生重大影响。本文有助于理解深度伪造的概况及其检测和生成方法。我们评估了各类深度伪造，尤其是音频方面的。本次调查的目的是让读者更深入地了解：（1）不同的深度伪造类别；（2）它们是如何创建和检测的；（3）更具体地说，音频深度伪造是如何创建和检测的，这是本文的主要重点。我们发现生成对抗网络（GAN）、卷积神经网络（CNN）和深度神经网络（DNN）是创建和检测深度伪造的常见方法。在我们对150多种方法的评估中，我们发现大多数研究集中在视频深度伪造上，尤其是视频深度伪造的生成。我们发现，对于文本深度伪造，有更多的生成方法，但用于检测的可靠方法却很少，包括假新闻检测，由于其与人类生成的虚假内容可能存在大量重叠，这已成为一个有争议的研究领域。我们的研究表明，显然需要对音频深度伪造进行研究，尤其是音频深度伪造的检测。与现有的主要关注视频和图像深度伪造的调查论文相比，本次调查是从不同的角度进行的。本次调查主要关注在大多数现有调查中被忽视的音频深度伪造。本文最重要的贡献是批判性地分析并提供了一个独特的音频深度伪造研究资源，大部分研究时间跨度从2016年到2021年。据我们所知，这是第一篇以英文撰写的专注于音频深度伪造生成和检测的调查。

相似文献

Audio deepfakes: A survey.

Front Big Data. 2023 Jan 9;5:1001063. doi: 10.3389/fdata.2022.1001063. eCollection 2022.

A Review of Image Processing Techniques for Deepfakes.

Sensors (Basel). 2022 Jun 16;22(12):4556. doi: 10.3390/s22124556.

Deepfakes Generation and Detection: A Short Survey.

J Imaging. 2023 Jan 13;9(1):18. doi: 10.3390/jimaging9010018.

Deepfakes as a threat to a speaker and facial recognition: An overview of tools and attack vectors.

Heliyon. 2023 Apr 3;9(4):e15090. doi: 10.1016/j.heliyon.2023.e15090. eCollection 2023 Apr.

Deepfake attack prevention using steganography GANs.

PeerJ Comput Sci. 2022 Oct 20;8:e1125. doi: 10.7717/peerj-cs.1125. eCollection 2022.

A Robust Approach to Multimodal Deepfake Detection.

J Imaging. 2023 Jun 19;9(6):122. doi: 10.3390/jimaging9060122.

Deepfake forensics: a survey of digital forensic methods for multimodal deepfake identification on social media.

PeerJ Comput Sci. 2024 May 27;10:e2037. doi: 10.7717/peerj-cs.2037. eCollection 2024.

Countering Malicious DeepFakes: Survey, Battleground, and Horizon.

Int J Comput Vis. 2022;130(7):1678-1734. doi: 10.1007/s11263-022-01606-8. Epub 2022 May 4.

The Face Deepfake Detection Challenge.

J Imaging. 2022 Sep 28;8(10):263. doi: 10.3390/jimaging8100263.

Do deepfake videos undermine our epistemic trust? A thematic analysis of tweets that discuss deepfakes in the Russian invasion of Ukraine.

PLoS One. 2023 Oct 25;18(10):e0291668. doi: 10.1371/journal.pone.0291668. eCollection 2023.

引用本文的文献

Audio Deepfake Detection: What Has Been Achieved and What Lies Ahead.

Sensors (Basel). 2025 Mar 22;25(7):1989. doi: 10.3390/s25071989.

OpenAI's Sora and Google's Veo 2 in Action: A Narrative Review of Artificial Intelligence-driven Video Generation Models Transforming Healthcare.

Cureus. 2025 Jan 17;17(1):e77593. doi: 10.7759/cureus.77593. eCollection 2025 Jan.

Deepfake: definitions, performance metrics and standards, datasets, and a meta-review.

Front Big Data. 2024 Sep 4;7:1400024. doi: 10.3389/fdata.2024.1400024. eCollection 2024.

A systematic review of AI literacy scales.

NPJ Sci Learn. 2024 Aug 6;9(1):50. doi: 10.1038/s41539-024-00264-4.

本文引用的文献

A preliminary analysis of AI based smartphone application for diagnosis of COVID-19 using chest X-ray images.

Expert Syst Appl. 2021 Nov 30;183:115401. doi: 10.1016/j.eswa.2021.115401. Epub 2021 Jun 12.

Forecasting of COVID-19 using deep layer Recurrent Neural Networks (RNNs) with Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTM) cells.

Chaos Solitons Fractals. 2021 May;146:110861. doi: 10.1016/j.chaos.2021.110861. Epub 2021 Mar 14.

Audio-based snore detection using deep neural networks.

Comput Methods Programs Biomed. 2021 Mar;200:105917. doi: 10.1016/j.cmpb.2020.105917. Epub 2020 Dec 25.

A Style-Based Generator Architecture for Generative Adversarial Networks.

IEEE Trans Pattern Anal Mach Intell. 2021 Dec;43(12):4217-4228. doi: 10.1109/TPAMI.2020.2970919. Epub 2021 Nov 3.

Deep learning.

Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.

Foreign accent conversion in computer assisted pronunciation training.

Speech Commun. 2009 Oct;51(10):920-932. doi: 10.1016/j.specom.2008.11.004.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

音频深度伪造：一项调查。

Audio deepfakes: A survey.

作者信息

Khanjani Zahra, Watson Gabrielle, Janeja Vandana P

机构信息

Department of Information System, University of Maryland Baltimore County, Baltimore, MD, United States.

出版信息

Front Big Data. 2023 Jan 9;5:1001063. doi: 10.3389/fdata.2022.1001063. eCollection 2022.

DOI:10.3389/fdata.2022.1001063

PMID:36700137

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9869423/

Abstract

摘要

音频深度伪造：一项调查。

Audio deepfakes: A survey.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

音频深度伪造：一项调查。

Audio deepfakes: A survey.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献