• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过迁移学习和字体多样性改进印度语言的场景文本识别

Improving Scene Text Recognition for Indian Languages with Transfer Learning and Font Diversity.

作者信息

Gunna Sanjana, Saluja Rohit, Jawahar Cheerakkuzhi Veluthemana

机构信息

Centre for Vision Information Technology, International Institute of Information Technology, Hyderabad 500032, India.

出版信息

J Imaging. 2022 Mar 23;8(4):86. doi: 10.3390/jimaging8040086.

DOI:10.3390/jimaging8040086
PMID:35448213
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9025185/
Abstract

Reading Indian scene texts is complex due to the use of regional vocabulary, multiple fonts/scripts, and text size. This work investigates the significant differences in Indian and Latin Scene Text Recognition (STR) systems. Recent STR works rely on synthetic generators that involve diverse fonts to ensure robust reading solutions. We present utilizing additional non-Unicode fonts with generally employed Unicode fonts to cover font diversity in such synthesizers for Indian languages. We also perform experiments on transfer learning among six different Indian languages. Our transfer learning experiments on synthetic images with common backgrounds provide an exciting insight that Indian scripts can benefit from each other than from the extensive English datasets. Our evaluations for the real settings help us achieve significant improvements over previous methods on four Indian languages from standard datasets like IIIT-ILST, MLT-17, and the new dataset (we release) containing 440 scene images with 500 Gujarati and 2535 Tamil words. Further enriching the synthetic dataset with non-Unicode fonts and multiple augmentations helps us achieve a remarkable Word Recognition Rate gain of over 33% on the IIIT-ILST Hindi dataset. We also present the results of lexicon-based transcription approaches for all six languages.

摘要

由于使用了地区性词汇、多种字体/脚本以及文本大小,阅读印度场景文本很复杂。这项工作研究了印度和拉丁场景文本识别(STR)系统中的显著差异。最近的STR工作依赖于合成生成器,这些生成器涉及多种字体以确保强大的阅读解决方案。我们提出在印度语言的此类合成器中,将额外的非Unicode字体与常用的Unicode字体一起使用,以涵盖字体多样性。我们还对六种不同的印度语言进行了迁移学习实验。我们在具有共同背景的合成图像上进行的迁移学习实验提供了一个令人兴奋的见解,即印度脚本彼此之间能比从大量英语数据集中受益更多。我们对实际场景的评估帮助我们在来自IIIT - ILST、MLT - 17等标准数据集以及包含440个场景图像(其中有500个古吉拉特语单词和2535个泰米尔语单词)的新数据集(我们发布的)上,对四种印度语言的先前方法有了显著改进。用非Unicode字体和多种增强方式进一步丰富合成数据集,帮助我们在IIIT - ILST印地语数据集上实现了超过33%的显著单词识别率提升。我们还展示了所有六种语言基于词典的转录方法的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/07084087b0d6/jimaging-08-00086-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/bbc493427219/jimaging-08-00086-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/59c02abb8415/jimaging-08-00086-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/972256659700/jimaging-08-00086-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/243ea7bca71c/jimaging-08-00086-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/2a05ae3ae7ef/jimaging-08-00086-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/49399c827468/jimaging-08-00086-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/cc848fbb1ec3/jimaging-08-00086-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/fbd14d0c0420/jimaging-08-00086-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/173fcbabff04/jimaging-08-00086-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/901d2bae9b29/jimaging-08-00086-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/8fd7eb0a2db2/jimaging-08-00086-g011a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/07084087b0d6/jimaging-08-00086-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/bbc493427219/jimaging-08-00086-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/59c02abb8415/jimaging-08-00086-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/972256659700/jimaging-08-00086-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/243ea7bca71c/jimaging-08-00086-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/2a05ae3ae7ef/jimaging-08-00086-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/49399c827468/jimaging-08-00086-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/cc848fbb1ec3/jimaging-08-00086-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/fbd14d0c0420/jimaging-08-00086-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/173fcbabff04/jimaging-08-00086-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/901d2bae9b29/jimaging-08-00086-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/8fd7eb0a2db2/jimaging-08-00086-g011a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acfc/9025185/07084087b0d6/jimaging-08-00086-g012.jpg

相似文献

1
Improving Scene Text Recognition for Indian Languages with Transfer Learning and Font Diversity.通过迁移学习和字体多样性改进印度语言的场景文本识别
J Imaging. 2022 Mar 23;8(4):86. doi: 10.3390/jimaging8040086.
2
Cursive-Text: A Comprehensive Dataset for End-to-End Urdu Text Recognition in Natural Scene Images.连笔文本:用于自然场景图像中乌尔都语文本端到端识别的综合数据集。
Data Brief. 2020 May 21;31:105749. doi: 10.1016/j.dib.2020.105749. eCollection 2020 Aug.
3
Arbitrary Font Generation by Encoder Learning of Disentangled Features.通过解缠特征的编码器学习生成任意字体。
Sensors (Basel). 2022 Mar 19;22(6):2374. doi: 10.3390/s22062374.
4
IndicDialogue: A dataset of subtitles in 10 Indic languages for Indic language modeling.印度语对话:一个用于印度语语言建模的包含10种印度语字幕的数据集。
Data Brief. 2024 Jul 3;55:110690. doi: 10.1016/j.dib.2024.110690. eCollection 2024 Aug.
5
Can very small font size enhance memory?非常小的字体能增强记忆力吗?
Mem Cognit. 2018 Aug;46(6):979-993. doi: 10.3758/s13421-018-0816-6.
6
Real-Time Lexicon-Free Scene Text Localization and Recognition.实时无词典场景文本定位与识别。
IEEE Trans Pattern Anal Mach Intell. 2016 Sep;38(9):1872-85. doi: 10.1109/TPAMI.2015.2496234. Epub 2015 Oct 30.
7
Multilingual character recognition dataset for Moroccan official documents.摩洛哥官方文件的多语言字符识别数据集。
Data Brief. 2023 Dec 13;52:109953. doi: 10.1016/j.dib.2023.109953. eCollection 2024 Feb.
8
Fonts of wider letter shapes improve letter recognition in parafovea and periphery.宽字母形状的字体可以提高中央凹和周边的字母识别。
Ergonomics. 2022 May;65(5):753-761. doi: 10.1080/00140139.2021.1991001. Epub 2021 Oct 27.
9
HMM-based lexicon-driven and lexicon-free word recognition for online handwritten Indic scripts.基于隐马尔可夫模型的词典驱动和无词典的在线手写体印度语脚本词识别。
IEEE Trans Pattern Anal Mach Intell. 2012 Apr;34(4):670-82. doi: 10.1109/TPAMI.2011.234.
10
A Robot Object Recognition Method Based on Scene Text Reading in Home Environments.基于家庭环境中场景文本阅读的机器人目标识别方法。
Sensors (Basel). 2021 Mar 9;21(5):1919. doi: 10.3390/s21051919.

本文引用的文献

1
ASTER: An Attentional Scene Text Recognizer with Flexible Rectification.ASTER:具有灵活矫正功能的注意场景文本识别器。
IEEE Trans Pattern Anal Mach Intell. 2019 Sep;41(9):2035-2048. doi: 10.1109/TPAMI.2018.2848939. Epub 2018 Jun 25.
2
An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition.基于图像的序列识别的端到端可训练神经网络及其在场景文本识别中的应用。
IEEE Trans Pattern Anal Mach Intell. 2017 Nov;39(11):2298-2304. doi: 10.1109/TPAMI.2016.2646371. Epub 2016 Dec 29.
3
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.
更快的 R-CNN:基于区域建议网络的实时目标检测。
IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031. Epub 2016 Jun 6.