Zhao Lin, Chen Changsheng, Huang Jiwu
IEEE Trans Image Process. 2021;30:7964-7979. doi: 10.1109/TIP.2021.3112048. Epub 2021 Sep 22.
With the ongoing popularization of online services, the digital document images have been used in various applications. Meanwhile, there have emerged some deep learning-based text editing algorithms which alter the textual information of an image in an end-to-end fashion. In this work, we present a low-cost document forgery algorithm by the existing deep learning-based technologies to edit practical document images. To achieve this goal, the limitations of existing text editing algorithms towards complicated characters and complex background are addressed by a set of network design strategies. First, the unnecessary confusion in the supervision data is avoided by disentangling the textual and background information in the source images. Second, to capture the structure of some complicated components, the text skeleton is provided as auxiliary information and the continuity in texture is considered explicitly in the loss function. Third, the forgery traces induced by the text editing operation are mitigated by some post-processing operations which consider the distortions from the print-and-scan channel. Quantitative comparisons of the proposed method and the exiting approach have shown the advantages of our design by reducing the about 2/3 reconstruction error measured in MSE, improving reconstruction quality measured in PSNR and in SSIM by 4 dB and 0.21, respectively. Qualitative experiments have confirmed that the reconstruction results of the proposed method are visually better than the existing approach in both complicated characters and complex texture. More importantly, we have demonstrated the performance of the proposed document forgery algorithm under a practical scenario where an attacker is able to alter the textual information in an identity document using only one sample in the target domain. The forged-and-recaptured samples created by the proposed text editing attack and recapturing operation have successfully fooled some existing document authentication systems.
随着在线服务的不断普及,数字文档图像已被应用于各种领域。与此同时,出现了一些基于深度学习的文本编辑算法,这些算法以端到端的方式改变图像中的文本信息。在这项工作中,我们利用现有的基于深度学习的技术提出了一种低成本的文档伪造算法,用于编辑实际的文档图像。为了实现这一目标,通过一组网络设计策略解决了现有文本编辑算法在处理复杂字符和复杂背景方面的局限性。首先,通过解开源图像中的文本和背景信息,避免了监督数据中不必要的混淆。其次,为了捕捉一些复杂组件的结构,提供文本骨架作为辅助信息,并在损失函数中明确考虑纹理的连续性。第三,通过一些后处理操作减轻了文本编辑操作引起的伪造痕迹,这些操作考虑了打印和扫描通道产生的失真。所提方法与现有方法的定量比较表明,我们的设计具有优势,将均方误差(MSE)测量的重建误差降低了约2/3,将峰值信噪比(PSNR)和结构相似性指数(SSIM)测量的重建质量分别提高了4 dB和0.21。定性实验证实,在所提方法的重建结果在复杂字符和复杂纹理方面在视觉上都优于现有方法。更重要的是,我们展示了所提文档伪造算法在实际场景下的性能,即攻击者仅使用目标域中的一个样本就能更改身份证中的文本信息。通过所提文本编辑攻击和重新捕获操作创建的伪造并重新捕获的样本成功骗过了一些现有的文档认证系统。