Suppr超能文献

检测扩散器生成的图像。

Detecting images generated by diffusers.

作者信息

Coccomini Davide Alessandro, Esuli Andrea, Falchi Fabrizio, Gennaro Claudio, Amato Giuseppe

机构信息

Institute of Information Science and Technologies "Alessandro Faedo", Italian National Research Council, Pisa, Tuscany, Italy.

Information Engineering, University of Pisa, Pisa, Tuscany, Italy.

出版信息

PeerJ Comput Sci. 2024 Jul 10;10:e2127. doi: 10.7717/peerj-cs.2127. eCollection 2024.

Abstract

In recent years, the field of artificial intelligence has witnessed a remarkable surge in the generation of synthetic images, driven by advancements in deep learning techniques. These synthetic images, often created through complex algorithms, closely mimic real photographs, blurring the lines between reality and artificiality. This proliferation of synthetic visuals presents a pressing challenge: how to accurately and reliably distinguish between genuine and generated images. This article, in particular, explores the task of detecting images generated by text-to-image diffusion models, highlighting the challenges and peculiarities of this field. To evaluate this, we consider images generated from captions in the MSCOCO and Wikimedia datasets using two state-of-the-art models: Stable Diffusion and GLIDE. Our experiments show that it is possible to detect the generated images using simple multi-layer perceptrons (MLPs), starting from features extracted by CLIP or RoBERTa, or using traditional convolutional neural networks (CNNs). These latter models achieve remarkable performances in particular when pretrained on large datasets. We also observe that models trained on images generated by Stable Diffusion can occasionally detect images generated by GLIDE, but only on the MSCOCO dataset. However, the reverse is not true. Lastly, we find that incorporating the associated textual information with the images in some cases can lead to a better generalization capability, especially if textual features are closely related to visual ones. We also discovered that the type of subject depicted in the image can significantly impact performance. This work provides insights into the feasibility of detecting generated images and has implications for security and privacy concerns in real-world applications. The code to reproduce our results is available at: https://github.com/davide-coccomini/Detecting-Images-Generated-by-Diffusers.

摘要

近年来,受深度学习技术进步的推动,人工智能领域在合成图像生成方面出现了显著增长。这些合成图像通常通过复杂算法创建,与真实照片极为相似,模糊了现实与虚拟之间的界限。合成视觉内容的大量涌现带来了一个紧迫的挑战:如何准确可靠地区分真实图像和生成图像。本文特别探讨了检测由文本到图像扩散模型生成的图像的任务,突出了该领域的挑战和特殊性。为了对此进行评估,我们使用两个最先进的模型:Stable Diffusion和GLIDE,来考虑从MSCOCO和维基媒体数据集中的字幕生成的图像。我们的实验表明,从CLIP或RoBERTa提取的特征开始,使用简单的多层感知器(MLP),或者使用传统的卷积神经网络(CNN),就有可能检测出生成的图像。后一种模型在特别是在大型数据集上进行预训练时,表现出色。我们还观察到,在由Stable Diffusion生成的图像上训练的模型偶尔可以检测出由GLIDE生成的图像,但仅限于MSCOCO数据集。然而,反之则不成立。最后,我们发现,在某些情况下,将相关文本信息与图像相结合可以带来更好的泛化能力,特别是如果文本特征与视觉特征密切相关。我们还发现图像中描绘的主题类型会对性能产生显著影响。这项工作为检测生成图像的可行性提供了见解,并对现实世界应用中的安全和隐私问题具有启示意义。重现我们结果的代码可在以下网址获取:https://github.com/davide-coccomini/Detecting-Images-Generated-by-Diffusers

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48c7/11322988/be8101156204/peerj-cs-10-2127-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验