National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Zhongguancun East Road, No. 95, Haidian District, Beijing 100190, P.R. China.
IEEE Trans Pattern Anal Mach Intell. 2012 Apr;34(4):707-22. doi: 10.1109/TPAMI.2011.151.
In this paper, we propose a metric rectification method to restore an image from a single camera-captured document image. The core idea is to construct an isometric image mesh by exploiting the geometry of page surface and camera. Our method uses a general cylindrical surface (GCS) to model the curved page shape. Under a few proper assumptions, the printed horizontal text lines are shown to be line convergent symmetric. This property is then used to constrain the estimation of various model parameters under perspective projection. We also introduce a paraperspective projection to approximate the nonlinear perspective projection. A set of close-form formulas is thus derived for the estimate of GCS directrix and document aspect ratio. Our method provides a straightforward framework for image metric rectification. It is insensitive to camera positions, viewing angles, and the shapes of document pages. To evaluate the proposed method, we implemented comprehensive experiments on both synthetic and real-captured images. The results demonstrate the efficiency of our method. We also carried out a comparative experiment on the public CBDAR2007 data set. The experimental results show that our method outperforms the state-of-the-art methods in terms of OCR accuracy and rectification errors.
在本文中,我们提出了一种度量校正方法,从单相机捕获的文档图像中恢复图像。其核心思想是通过利用页面表面和相机的几何形状来构建等距图像网格。我们的方法使用一般圆柱面(GCS)来建模弯曲的页面形状。在几个适当的假设下,显示打印的水平文本行是线收敛对称的。然后利用该属性约束透视投影下的各种模型参数的估计。我们还引入了一种平行透视投影来近似非线性透视投影。因此,导出了一组用于 GCS 准线和文档纵横比估计的闭式公式。我们的方法为图像度量校正提供了一个直接的框架。它对相机位置、视角和文档页面的形状不敏感。为了评估所提出的方法,我们在合成和真实捕获的图像上进行了全面的实验。结果表明了我们方法的效率。我们还在公共 CBDAR2007 数据集上进行了对比实验。实验结果表明,在 OCR 准确性和校正误差方面,我们的方法优于最先进的方法。