用于检测图像伪造的混合长短期记忆网络和编码器-解码器架构

Hybrid LSTM and Encoder-Decoder Architecture for Detection of Image Forgeries.

作者信息

Bappy Jawadul H, Simons Cody, Nataraj Lakshmanan, Manjunath B S, Roy-Chowdhury Amit K

出版信息

IEEE Trans Image Process. 2019 Jul;28(7):3286-3300. doi: 10.1109/TIP.2019.2895466. Epub 2019 Jan 25.

DOI:10.1109/TIP.2019.2895466

Abstract

With advanced image journaling tools, one can easily alter the semantic meaning of an image by exploiting certain manipulation techniques such as copy clone, object splicing, and removal, which mislead the viewers. In contrast, the identification of these manipulations becomes a very challenging task as manipulated regions are not visually apparent. This paper proposes a high-confidence manipulation localization architecture that utilizes resampling features, long short-term memory (LSTM) cells, and an encoder-decoder network to segment out manipulated regions from non-manipulated ones. Resampling features are used to capture artifacts, such as JPEG quality loss, upsampling, downsampling, rotation, and shearing. The proposed network exploits larger receptive fields (spatial maps) and frequency-domain correlation to analyze the discriminative characteristics between the manipulated and non-manipulated regions by incorporating the encoder and LSTM network. Finally, the decoder network learns the mapping from low-resolution feature maps to pixel-wise predictions for image tamper localization. With the predicted mask provided by the final layer (softmax) of the proposed architecture, end-to-end training is performed to learn the network parameters through back-propagation using the ground-truth masks. Furthermore, a large image splicing dataset is introduced to guide the training process. The proposed method is capable of localizing image manipulations at the pixel level with high precision, which is demonstrated through rigorous experimentation on three diverse datasets.

摘要

借助先进的图像记录工具，人们可以通过利用某些操作技术，如复制克隆、对象拼接和去除等，轻松改变图像的语义，从而误导观众。相比之下，由于被操作区域在视觉上并不明显，识别这些操作就成为一项极具挑战性的任务。本文提出了一种高置信度的操作定位架构，该架构利用重采样特征、长短期记忆（LSTM）单元和编码器-解码器网络，将被操作区域与未被操作区域分割开来。重采样特征用于捕捉诸如JPEG质量损失、上采样、下采样、旋转和剪切等伪像。所提出的网络通过结合编码器和LSTM网络，利用更大的感受野（空间图）和频域相关性来分析被操作区域和未被操作区域之间的判别特征。最后，解码器网络学习从低分辨率特征图到用于图像篡改定位的逐像素预测的映射。利用所提出架构的最后一层（softmax）提供的预测掩码，通过使用真实掩码进行反向传播来执行端到端训练，以学习网络参数。此外，还引入了一个大型图像拼接数据集来指导训练过程。所提出的方法能够在像素级别高精度地定位图像操作，这在三个不同数据集上的严格实验中得到了证明。