College of Information Science and Engineering, Xinjiang University, Urumqi, China.
Xinjiang Key Laboratory of Signal Detection and Processing, Urumqi, Xinjiang, China.
PLoS One. 2022 Aug 5;17(8):e0272322. doi: 10.1371/journal.pone.0272322. eCollection 2022.
With the advent of the era of artificial intelligence, text detection is widely used in the real world. In text detection, due to the limitation of the receptive field of the neural network, most existing scene text detection methods cannot accurately detect small target text instances in any direction, and the detection rate of mutually adhering text instances is low, which is prone to false detection. To tackle such difficulties, in this paper, we propose a new feature pyramid network for scene text detection, Cross-Scale Attention Aggregation Feature Pyramid Network (CSAA-FPN). Specifically, we use a Attention Aggregation Feature Module (AAFM) to enhance features, which not only solves the problem of weak features and small receptive fields extracted by lightweight networks but also better handles multi-scale information and accurately separate adjacent text instances. An attention module CBAM is introduced to focus on effective information so that the output feature layer has richer and more accurate information. Furthermore, we design an Adaptive Fusion Module (AFM), which weights the output features and pays attention to the pixel information to further refine the features. Experiments conducted on CTW1500, Total-Text, ICDAR2015, and MSRA-TD500 have demonstrated the superiority of this model.
随着人工智能时代的到来,文本检测在现实世界中得到了广泛应用。在文本检测中,由于神经网络的感受野的限制,大多数现有的场景文本检测方法无法在任何方向上准确检测小目标文本实例,并且相互粘连的文本实例的检测率较低,容易出现误检。为了解决这些困难,本文提出了一种新的用于场景文本检测的特征金字塔网络,即交叉尺度注意力聚合特征金字塔网络(CSAA-FPN)。具体来说,我们使用注意力聚合特征模块(AAFM)来增强特征,这不仅解决了轻量级网络提取的特征较弱和感受野较小的问题,而且更好地处理了多尺度信息,并准确地分离了相邻的文本实例。引入了注意力模块 CBAM 来关注有效信息,从而使输出特征层具有更丰富和更准确的信息。此外,我们设计了自适应融合模块(AFM),对输出特征进行加权并关注像素信息,以进一步细化特征。在 CTW1500、Total-Text、ICDAR2015 和 MSRA-TD500 上进行的实验证明了该模型的优越性。