National Laboratory of Pattern Recognition,Institute of Automation, Chinese Academy of Sciences, Beijing, P.R. China.
IEEE Trans Pattern Anal Mach Intell. 2012 Aug;34(8):1469-81. doi: 10.1109/TPAMI.2011.264.
This paper presents an effective approach for the offline recognition of unconstrained handwritten Chinese texts. Under the general integrated segmentation-and-recognition framework with character oversegmentation, we investigate three important issues: candidate path evaluation, path search, and parameter estimation. For path evaluation, we combine multiple contexts (character recognition scores, geometric and linguistic contexts) from the Bayesian decision view, and convert the classifier outputs to posterior probabilities via confidence transformation. In path search, we use a refined beam search algorithm to improve the search efficiency and, meanwhile, use a candidate character augmentation strategy to improve the recognition accuracy. The combining weights of the path evaluation function are optimized by supervised learning using a Maximum Character Accuracy criterion. We evaluated the recognition performance on a Chinese handwriting database CASIA-HWDB, which contains nearly four million character samples of 7,356 classes and 5,091 pages of unconstrained handwritten texts. The experimental results show that confidence transformation and combining multiple contexts improve the text line recognition performance significantly. On a test set of 1,015 handwritten pages, the proposed approach achieved character-level accurate rate of 90.75 percent and correct rate of 91.39 percent, which are superior by far to the best results reported in the literature.
本文提出了一种有效的脱机手写体汉字识别方法。在具有字符过分割的通用集成分割与识别框架下,研究了三个重要问题:候选路径评估、路径搜索和参数估计。对于路径评估,我们从贝叶斯决策的角度结合了多个上下文(字符识别得分、几何和语言上下文),并通过置信度转换将分类器输出转换为后验概率。在路径搜索中,我们使用改进的波束搜索算法来提高搜索效率,同时使用候选字符增强策略来提高识别精度。路径评估函数的组合权重通过使用最大字符准确率的监督学习进行优化。我们在包含近四百万个样本、7356 类、5091 页无约束手写文本的 CASIA-HWDB 汉字手写数据库上评估了识别性能。实验结果表明,置信度转换和结合多个上下文显著提高了文本行识别性能。在 1015 页手写测试集上,所提出的方法在字符级上的准确率达到了 90.75%,正确率达到了 91.39%,远优于文献中报道的最佳结果。