结合指代表达理解的明确场景文本分割

Unambiguous Scene Text Segmentation with Referring Expression Comprehension.

作者信息

Rong Xuejian, Yi Chucai, Tian Yingli

出版信息

IEEE Trans Image Process. 2019 Jul 26. doi: 10.1109/TIP.2019.2930176.

DOI:10.1109/TIP.2019.2930176

Abstract

Text instance provides valuable information for the understanding and interpretation of natural scenes. The rich, precise high-level semantics embodied in the text could be beneficial for understanding the world around us, and empower a wide range of real-world applications. While most recent visual phrase grounding approaches focus on general objects, this paper explores extracting designated texts and predicting unambiguous scene text segmentation mask, i.e. scene text segmentation from natural language descriptions (referring expressions) like orange text on a little boy in black swinging a bat. The solution of this novel problem enables accurate segmentation of scene text instances from the complex background. In our proposed framework, a unified deep network jointly models visual and linguistic information by encoding both region-level and pixel-level visual features of natural scene images into spatial feature maps, and then decode them into saliency response map of text instances. To conduct quantitative evaluations, we establish a new scene text referring expression segmentation dataset: COCO-CharRef. Experimental results demonstrate the effectiveness of the proposed framework on the text instance segmentation task. By combining image-based visual features with language-based textual explanations, our framework outperforms baselines that are derived from state-of-the-art text localization and natural language object retrieval methods on COCO-CharRef dataset.

摘要

文本实例为理解和解释自然场景提供了有价值的信息。文本中所体现的丰富、精确的高级语义有助于我们理解周围的世界，并为广泛的现实世界应用提供支持。虽然最近大多数视觉短语定位方法都集中在一般物体上，但本文探索提取指定文本并预测明确的场景文本分割掩码，即从自然语言描述（指代表达式）中进行场景文本分割，例如“一个穿着黑色衣服的小男孩挥舞着球棒时的橙色文字”。解决这个新问题能够从复杂背景中准确分割出场景文本实例。在我们提出的框架中，一个统一的深度网络通过将自然场景图像的区域级和像素级视觉特征编码到空间特征图中，从而联合对视觉和语言信息进行建模，然后将它们解码为文本实例的显著性响应图。为了进行定量评估，我们建立了一个新的场景文本指代表达式分割数据集：COCO-CharRef。实验结果证明了所提出框架在文本实例分割任务上的有效性。通过将基于图像的视觉特征与基于语言的文本解释相结合，我们的框架在COCO-CharRef数据集上优于源自最新文本定位和自然语言对象检索方法的基线。

相似文献

Unambiguous Scene Text Segmentation with Referring Expression Comprehension.结合指代表达理解的明确场景文本分割

IEEE Trans Image Process. 2019 Jul 26. doi: 10.1109/TIP.2019.2930176.

Unambiguous Text Localization, Retrieval, and Recognition for Cluttered Scenes.用于混杂场景的无歧义文本定位、检索和识别。

IEEE Trans Pattern Anal Mach Intell. 2022 Mar;44(3):1638-1652. doi: 10.1109/TPAMI.2020.3018491. Epub 2022 Feb 3.

Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing.通过指称表达理解和场景图解析实现交互式自然语言基础

Front Neurorobot. 2020 Jun 25;14:43. doi: 10.3389/fnbot.2020.00043. eCollection 2020.

Visual Saliency Models for Text Detection in Real World.用于现实世界中文本检测的视觉显著性模型

PLoS One. 2014 Dec 10;9(12):e114539. doi: 10.1371/journal.pone.0114539. eCollection 2014.

Semi-Supervised Pixel-Level Scene Text Segmentation by Mutually Guided Network.基于相互引导网络的半监督像素级场景文本分割

IEEE Trans Image Process. 2021;30:8212-8221. doi: 10.1109/TIP.2021.3113157. Epub 2021 Sep 30.

Lowis3D: Language-Driven Open-World Instance-Level 3D Scene Understanding.Lowis3D：语言驱动的开放世界实例级3D场景理解

IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):8517-8533. doi: 10.1109/TPAMI.2024.3410324. Epub 2024 Nov 6.

Box2Mask: Box-Supervised Instance Segmentation via Level-Set Evolution.Box2Mask：通过水平集演化实现的盒监督实例分割

IEEE Trans Pattern Anal Mach Intell. 2024 Jul;46(7):5157-5173. doi: 10.1109/TPAMI.2024.3363054. Epub 2024 Jun 5.

Cross-Modal Progressive Comprehension for Referring Segmentation.跨模态递进式理解的指代分割。

IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):4761-4775. doi: 10.1109/TPAMI.2021.3079993. Epub 2022 Aug 4.

PiGLET: Pixel-Level Grounding of Language Expressions With Transformers.PiGLET：基于Transformer的语言表达像素级定位

IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12206-12221. doi: 10.1109/TPAMI.2023.3286760. Epub 2023 Sep 5.

The Linguistic Analysis of Scene Semantics: LASS.场景语义的语言分析：LASS。

Behav Res Methods. 2020 Dec;52(6):2349-2371. doi: 10.3758/s13428-020-01390-8.

结合指代表达理解的明确场景文本分割

Unambiguous Scene Text Segmentation with Referring Expression Comprehension.

作者信息

Rong Xuejian, Yi Chucai, Tian Yingli

出版信息

IEEE Trans Image Process. 2019 Jul 26. doi: 10.1109/TIP.2019.2930176.

DOI:10.1109/TIP.2019.2930176

PMID:31369378

Abstract

摘要

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

结合指代表达理解的明确场景文本分割

Unambiguous Scene Text Segmentation with Referring Expression Comprehension.

作者信息

出版信息

相似文献

结合指代表达理解的明确场景文本分割

Unambiguous Scene Text Segmentation with Referring Expression Comprehension.

作者信息

出版信息

相似文献