新媒体背景下使用生成式人工智能的视觉传播。

The visual communication using generative artificial intelligence in the context of new media.

作者信息

Liu Weinan, Kim Hyung-Gi

机构信息

Graduate School of Advanced Imaging Science, Multimedia and Film, Chung-Ang University, Seoul, 06974, South Korea.

出版信息

Sci Rep. 2025 Apr 4;15(1):11577. doi: 10.1038/s41598-025-96869-9.

DOI:10.1038/s41598-025-96869-9

PMID:40185848

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11971437/

Abstract

The purpose of this work is to explore methods of visual communication based on generative artificial intelligence in the context of new media. This work proposes an image automatic generation and recognition model that integrates the Conditional Generative Adversarial Network (CGAN) with the Transformer algorithm. The generator component of the model takes noise vectors and conditional variables as inputs. Subsequently, a Transformer module is incorporated, where the multi-head self-attention mechanism enables the model to establish complex relationships among different data points. This is further refined through linear transformations and activation functions to enhance feature representations. Ultimately, the self-attention mechanism captures the long-range dependencies within images, facilitating the generation of high-quality images that meet specific conditions. The model's performance is assessed, and the findings show that the accuracy of the proposed model reaches 95.69%, exceeding the baseline algorithm Generative Adversarial Network by more than 4%. Additionally, the Peak Signal-to-Noise Ratio of the model's algorithm is 33dB, and the Structural Similarity Index is 0.83, indicating higher image generation quality and recognition accuracy. Therefore, the model proposed achieves high recognition and prediction accuracy of generated images, and higher image quality, promising significant application value in visual communication in the new media era.

摘要

这项工作的目的是在新媒体背景下探索基于生成式人工智能的视觉通信方法。这项工作提出了一种将条件生成对抗网络（CGAN）与Transformer算法相结合的图像自动生成与识别模型。该模型的生成器组件将噪声向量和条件变量作为输入。随后，引入了一个Transformer模块，其中多头自注意力机制使模型能够在不同数据点之间建立复杂的关系。通过线性变换和激活函数对其进行进一步优化，以增强特征表示。最终，自注意力机制捕捉图像中的长距离依赖关系，从而有助于生成符合特定条件的高质量图像。对该模型的性能进行了评估，结果表明所提出模型的准确率达到95.69%，比基线算法生成对抗网络高出4%以上。此外，该模型算法的峰值信噪比为33dB，结构相似性指数为0.83，表明图像生成质量和识别准确率更高。因此，所提出的模型在生成图像方面实现了高识别和预测准确率以及更高的图像质量，在新媒体时代的视觉通信中具有显著的应用价值。