Suppr超能文献

基于卷积块注意力机制-生成对抗网络模型的融合视觉关系与情感语义的艺术设计

Art design integrating visual relation and affective semantics based on Convolutional Block Attention Mechanism-generative adversarial network model.

作者信息

Shen Jiadong, Wang Jian

机构信息

School of Design and Art, Changsha University of Science and Technology, Changsha, Hunan, China.

出版信息

PeerJ Comput Sci. 2024 Aug 30;10:e2274. doi: 10.7717/peerj-cs.2274. eCollection 2024.

Abstract

Scene-based image semantic extraction and its precise sentiment expression significantly enhance artistic design. To address the incongruity between image features and sentiment features caused by non-bilinear pooling, this study introduces a generative adversarial network (GAN) model that integrates visual relationships with sentiment semantics. The GAN-based regularizer is utilized during training to incorporate target information derived from the contextual information into the process. This regularization mechanism imposes stronger penalties for inaccuracies in subject-object type predictions and integrates a sentiment corpus to generate more human-like descriptive statements. The capsule network is employed to reconstruct sentences and predict probabilities in the discriminator. To preserve crucial focal points in feature extraction, the Convolutional Block Attention Mechanism (CBAM) is introduced. Furthermore, two bidirectional long short-term memory (LSTM) modules are used to model both target and relational contexts, thereby refining target labels and inter-target relationships. Experimental results highlight the model's superiority over comparative models in terms of accuracy, BiLingual Evaluation Understudy (BLEU) score, and text preservation rate. The proposed model achieves an accuracy of 95.40% and the highest BLEU score of 16.79, effectively capturing both the label content and the emotional nuances within the image.

摘要

基于场景的图像语义提取及其精确的情感表达显著提升了艺术设计。为解决非双线性池化导致的图像特征与情感特征之间的不协调问题,本研究引入了一种将视觉关系与情感语义相结合的生成对抗网络(GAN)模型。在训练过程中使用基于GAN的正则化器,将从上下文信息中导出的目标信息纳入该过程。这种正则化机制对主客体类型预测中的不准确之处施加更强的惩罚,并整合情感语料库以生成更具人类风格的描述性语句。胶囊网络用于在判别器中重构句子并预测概率。为在特征提取中保留关键焦点,引入了卷积块注意力机制(CBAM)。此外,使用两个双向长短期记忆(LSTM)模块对目标和关系上下文进行建模,从而优化目标标签和目标间关系。实验结果突出了该模型在准确性、双语评估替补(BLEU)分数和文本保留率方面优于对比模型。所提出的模型实现了95.40%的准确率和16.79的最高BLEU分数,有效捕捉了图像中的标签内容和情感细微差别。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b53b/11419622/745ec9407884/peerj-cs-10-2274-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验