Université de Lyon, Lyon, France.
IEEE Trans Pattern Anal Mach Intell. 2010 Mar;32(3):431-47. doi: 10.1109/TPAMI.2009.33.
We present a new method for blind document bleed-through removal based on separate Markov Random Field (MRF) regularization for the recto and for the verso side, where separate priors are derived from the full graph. The segmentation algorithm is based on Bayesian Maximum a Posteriori (MAP) estimation. The advantages of this separate approach are the adaptation of the prior to the contents creation process (e.g., superimposing two handwritten pages), and the improvement of the estimation of the recto pixels through an estimation of the verso pixels covered by recto pixels; moreover, the formulation as a binary labeling problem with two hidden labels per pixels naturally leads to an efficient optimization method based on the minimum cut/maximum flow in a graph. The proposed method is evaluated on scanned document images from the 18th century, showing an improvement of character recognition results compared to other restoration methods.
我们提出了一种新的基于正反两面单独马尔可夫随机场(MRF)正则化的盲文档渗色去除方法,其中单独的先验概率是从完整的图中推导出来的。分割算法基于贝叶斯最大后验(MAP)估计。这种分离方法的优点是可以根据内容创建过程(例如,叠加两个手写页面)来调整先验概率,并且可以通过估计被正片覆盖的负片像素来改善正片像素的估计;此外,将其表述为具有每个像素两个隐藏标签的二进制标记问题自然会导致基于图中的最小割/最大流的有效优化方法。所提出的方法在 18 世纪的扫描文档图像上进行了评估,与其他恢复方法相比,字符识别结果得到了改善。