IEEE Trans Image Process. 2016 Dec;25(12):5702-5712. doi: 10.1109/TIP.2016.2614133. Epub 2016 Sep 27.
Scanned images of historical documents often suffer from bleed-through, which refers to the ink on one side seeping through the paper and appearing on the other side. In this paper, a new conditional random field (CRF)-based method is proposed to remove the bleed-through from the scanned images of historical images. The proposed method only requires the scanned image of one side, referred as a blind method. In general, the scanned historical document image is composed of three components: foreground, bleed-through, and background. By assuming Gaussian distributions of the three components, the proposed method establishes conditional probability distribution (CPD) models of the three components first. The parameters of the component CPD models are estimated based on an initial segmentation of the input image. Then, CRFs are used to capture the relations between observed pixels in the scanned image and the corresponding labels as well as the spatial relation between the adjacent labels. The belief propagation algorithm is used to calculate the probabilities of different labels for each pixel. Once the labeling is completed by choosing the most possible label for each pixel, the bleed-through component is removed from the input historical image by a random-filling inpainting algorithm. Experimental results on the real data set show that the proposed method preserves the foreground component very well and removes the bleed-through effectively.
历史文档的扫描图像常常存在渗色问题,即纸张一侧的墨水渗透到另一侧并显现出来。本文提出了一种基于条件随机场(CRF)的新方法,用于去除历史图像扫描图像中的渗色。该方法仅需要一侧的扫描图像,称为盲法。一般来说,扫描的历史文档图像由三个部分组成:前景、渗色和背景。通过假设这三个部分的高斯分布,该方法首先建立这三个部分的条件概率分布(CPD)模型。基于输入图像的初始分割来估计各部分CPD模型的参数。然后,使用CRF来捕捉扫描图像中观察到的像素与其相应标签之间的关系以及相邻标签之间的空间关系。使用置信传播算法计算每个像素不同标签的概率。一旦通过为每个像素选择最可能的标签完成标注,就通过随机填充修复算法从输入的历史图像中去除渗色部分。在真实数据集上的实验结果表明,该方法能很好地保留前景部分并有效去除渗色。