Zhou Jinfei, Zhu Yaping, Zhang Yana, Yang Cheng, Pan Hong
State Key Laboratory of Media Convergence and Communication, The Communication University of China, Beijing, 100024 China.
Data Science Research Institute, Swinburne University of Technology, Melbourne, 3122 Australia.
Neural Comput Appl. 2023;35(13):9481-9500. doi: 10.1007/s00521-022-08072-w. Epub 2023 Mar 16.
Automatically generating descriptions for disaster news images could effectively accelerate the spread of disaster message and lighten the burden of news editors from tedious news materials. Image caption algorithms are remarkable for generating captions directly from the content of the image. However, current image caption algorithms trained on existing image caption datasets fail to describe the disaster images with fundamental news elements. In this paper, we developed a large-scale disaster news image Chinese caption dataset (DNICC19k), which collected and annotated enormous news images related to disaster. Furthermore, we proposed a spatial-aware topic driven caption network (STCNet) to encode the interrelationships between these news objects and generate descriptive sentences related to news topics. STCNet firstly constructs a graph representation based on objects feature similarity. The graph reasoning module uses the spatial information to infer the weights of aggregated adjacent nodes according to a learnable Gaussian kernel function. Finally, the generation of news sentences are driven by the spatial-aware graph representations and the news topics distribution. Experimental results demonstrate that STCNet trained on DNICC19k could not only automatically creates descriptive sentences related to news topics for disaster news images, but also outperforms benchmark models such as Bottom-up, NIC, Show attend and AoANet on multiple evaluation metrics, achieving CIDEr/BLEU-4 scores of 60.26 and 17.01, respectively.
自动生成灾难新闻图片的描述可以有效地加速灾难信息的传播,并减轻新闻编辑处理繁琐新闻素材的负担。图像字幕算法在直接从图像内容生成字幕方面表现出色。然而,当前在现有图像字幕数据集上训练的图像字幕算法无法用基本的新闻元素描述灾难图像。在本文中,我们开发了一个大规模的灾难新闻图像中文字幕数据集(DNICC19k),该数据集收集并标注了大量与灾难相关的新闻图像。此外,我们提出了一种空间感知主题驱动的字幕网络(STCNet),以编码这些新闻对象之间的相互关系,并生成与新闻主题相关的描述性句子。STCNet首先基于对象特征相似性构建图表示。图推理模块使用空间信息根据可学习的高斯核函数推断聚合相邻节点的权重。最后,新闻句子的生成由空间感知图表示和新闻主题分布驱动。实验结果表明,在DNICC19k上训练的STCNet不仅可以自动为灾难新闻图像创建与新闻主题相关的描述性句子,而且在多个评估指标上优于诸如Bottom-up、NIC、Show attend和AoANet等基准模型,分别达到了60.26和17.01的CIDEr/BLEU-4分数。