Suppr超能文献

一种用于社交媒体图像场景文本检测和风格迁移的新型独立于语言的深度卷积神经网络。

A New Language-Independent Deep CNN for Scene Text Detection and Style Transfer in Social Media Images.

作者信息

Shivakumara Palaiahnakote, Banerjee Ayan, Pal Umapada, Nandanwar Lokesh, Lu Tong, Liu Cheng-Lin

出版信息

IEEE Trans Image Process. 2023;32:3552-3566. doi: 10.1109/TIP.2023.3287038. Epub 2023 Jun 29.

Abstract

Due to the adverse effect of quality caused by different social media and arbitrary languages in natural scenes, detecting text from social media images and transferring its style is challenging. This paper presents a novel end-to-end model for text detection and text style transfer in social media images. The key notion of the proposed work is to find dominant information, such as fine details in the degraded images (social media images), and then restore the structure of character information. Therefore, we first introduce a novel idea of extracting gradients from the frequency domain of the input image to reduce the adverse effect of different social media, which outputs text candidate points. The text candidates are further connected into components and used for text detection via a UNet++ like network with an EfficientNet backbone (EffiUNet++). Then, to deal with the style transfer issue, we devise a generative model, which comprises a target encoder and style parameter networks (TESP-Net) to generate the target characters by leveraging the recognition results from the first stage. Specifically, a series of residual mapping and a position attention module are devised to improve the shape and structure of generated characters. The whole model is trained end-to-end so as to optimize the performance. Experiments on our social media dataset, benchmark datasets of natural scene text detection and text style transfer show that the proposed model outperforms the existing text detection and style transfer methods in multilingual and cross-language scenario.

摘要

由于自然场景中不同社交媒体和随意语言所导致的质量负面影响,从社交媒体图像中检测文本并转换其风格具有挑战性。本文提出了一种用于社交媒体图像中文本检测和文本风格转换的新型端到端模型。所提出工作的关键概念是找到主导信息,例如退化图像(社交媒体图像)中的精细细节,然后恢复字符信息的结构。因此,我们首先引入一种从输入图像的频域提取梯度的新想法,以减少不同社交媒体的负面影响,从而输出文本候选点。文本候选点通过具有EfficientNet骨干的类似UNet++的网络(EffiUNet++)进一步连接成组件并用于文本检测。然后,为了处理风格转换问题,我们设计了一种生成模型,该模型由目标编码器和风格参数网络(TESP-Net)组成,通过利用第一阶段的识别结果来生成目标字符。具体而言,设计了一系列残差映射和位置注意力模块来改善生成字符的形状和结构。整个模型进行端到端训练以优化性能。在我们的社交媒体数据集、自然场景文本检测和文本风格转换的基准数据集上进行的实验表明,所提出的模型在多语言和跨语言场景中优于现有的文本检测和风格转换方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验