Liang Jian, DeMenthon Daniel, Doermann David
Amazon.com, 701 5th Avenue #614.B, Seattle, WA 98104, USA.
IEEE Trans Pattern Anal Mach Intell. 2008 Apr;30(4):591-605. doi: 10.1109/TPAMI.2007.70724.
Compared to typical scanners, handheld cameras offer convenient, flexible, portable, and non-contact image capture, which enables many new applications and breathes new life into existing ones. However, camera-captured documents may suffer from distortions caused by non-planar document shape and perspective projection, which lead to failure of current OCR technologies. We present a geometric rectification framework for restoring the frontal-flat view of a document from a single camera-captured image. Our approach estimates 3D document shape from texture flow information obtained directly from the image without requiring additional 3D/metric data or prior camera calibration. Our framework provides a unified solution for both planar and curved documents and can be applied in many, especially mobile, camera-based document analysis applications. Experiments show that our method produces results that are significantly more OCR compatible than the original images.
与传统扫描仪相比,手持相机提供了便捷、灵活、便携且非接触式的图像捕捉方式,这催生了许多新应用,并为现有应用注入了新活力。然而,相机拍摄的文档可能会因文档形状非平面和透视投影而产生失真,这导致当前的光学字符识别(OCR)技术失效。我们提出了一种几何校正框架,用于从单个相机拍摄的图像中恢复文档的正面平视视图。我们的方法直接从图像中获取纹理流信息来估计三维文档形状,无需额外的三维/度量数据或预先进行相机校准。我们的框架为平面和曲面文档提供了统一的解决方案,并且可应用于许多基于相机的文档分析应用,尤其是移动应用。实验表明,我们的方法所产生的结果比原始图像与OCR的兼容性显著更高。