Department of Information Science, Xi'an University of Technology, Xi'an 710054, China.
Sensors (Basel). 2021 Mar 10;21(6):1945. doi: 10.3390/s21061945.
A challenging aspect of scene text detection is to handle curved texts. In order to avoid the tedious manual annotations for training curve text detector, and to overcome the limitation of regression-based text detectors to irregular text, we introduce straightforward and efficient instance-aware curved scene text detector, namely, look more than twice (LOMT), which makes the regression-based text detection results gradually change from loosely bounded box to compact polygon. LOMT mainly composes of curve text shape approximation module and component merging network. The shape approximation module uses a particle swarm optimization-based text shape approximation method (called PSO-TSA) to fine-tune the quadrilateral text detection results to fit the curved text. The component merging network merges incomplete text sub-parts of text instances into more complete polygon through instance awareness, called ICMN. Experiments on five text datasets demonstrate that our method not only achieves excellent performance but also has relatively high speed. Ablation experiments show that PSO-TSA can solve the text's shape optimization problem efficiently, and ICMN has a satisfactory merger effect.
场景文本检测的一个挑战是处理弯曲文本。为了避免为训练曲线文本检测器进行繁琐的手动注释,并且克服基于回归的文本检测器对不规则文本的限制,我们引入了简单而高效的实例感知弯曲场景文本检测器,即多看两次(LOMT),它使基于回归的文本检测结果逐渐从宽松的边界框变为紧凑的多边形。LOMT 主要由曲线文本形状近似模块和组件合并网络组成。形状近似模块使用基于粒子群优化的文本形状近似方法(称为 PSO-TSA)来微调四边形文本检测结果,以适应弯曲文本。组件合并网络通过实例感知将文本实例的不完整文本子部分合并为更完整的多边形,称为 ICMN。在五个文本数据集上的实验表明,我们的方法不仅具有出色的性能,而且速度相对较快。消融实验表明,PSO-TSA 可以有效地解决文本的形状优化问题,而 ICMN 具有令人满意的合并效果。