Gunna Sanjana, Saluja Rohit, Jawahar Cheerakkuzhi Veluthemana
Centre for Vision Information Technology, International Institute of Information Technology, Hyderabad 500032, India.
J Imaging. 2022 Mar 23;8(4):86. doi: 10.3390/jimaging8040086.
Reading Indian scene texts is complex due to the use of regional vocabulary, multiple fonts/scripts, and text size. This work investigates the significant differences in Indian and Latin Scene Text Recognition (STR) systems. Recent STR works rely on synthetic generators that involve diverse fonts to ensure robust reading solutions. We present utilizing additional non-Unicode fonts with generally employed Unicode fonts to cover font diversity in such synthesizers for Indian languages. We also perform experiments on transfer learning among six different Indian languages. Our transfer learning experiments on synthetic images with common backgrounds provide an exciting insight that Indian scripts can benefit from each other than from the extensive English datasets. Our evaluations for the real settings help us achieve significant improvements over previous methods on four Indian languages from standard datasets like IIIT-ILST, MLT-17, and the new dataset (we release) containing 440 scene images with 500 Gujarati and 2535 Tamil words. Further enriching the synthetic dataset with non-Unicode fonts and multiple augmentations helps us achieve a remarkable Word Recognition Rate gain of over 33% on the IIIT-ILST Hindi dataset. We also present the results of lexicon-based transcription approaches for all six languages.
由于使用了地区性词汇、多种字体/脚本以及文本大小,阅读印度场景文本很复杂。这项工作研究了印度和拉丁场景文本识别(STR)系统中的显著差异。最近的STR工作依赖于合成生成器,这些生成器涉及多种字体以确保强大的阅读解决方案。我们提出在印度语言的此类合成器中,将额外的非Unicode字体与常用的Unicode字体一起使用,以涵盖字体多样性。我们还对六种不同的印度语言进行了迁移学习实验。我们在具有共同背景的合成图像上进行的迁移学习实验提供了一个令人兴奋的见解,即印度脚本彼此之间能比从大量英语数据集中受益更多。我们对实际场景的评估帮助我们在来自IIIT - ILST、MLT - 17等标准数据集以及包含440个场景图像(其中有500个古吉拉特语单词和2535个泰米尔语单词)的新数据集(我们发布的)上,对四种印度语言的先前方法有了显著改进。用非Unicode字体和多种增强方式进一步丰富合成数据集,帮助我们在IIIT - ILST印地语数据集上实现了超过33%的显著单词识别率提升。我们还展示了所有六种语言基于词典的转录方法的结果。