Faculty of Computer Science and Engineering, GIK Institute, Topi, Pakistan.
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" Area della Ricerca CNR di Pisa, Pisa, Italy.
PLoS One. 2023 Mar 22;18(3):e0282142. doi: 10.1371/journal.pone.0282142. eCollection 2023.
Ancient manuscripts are a rich source of history and civilization. Unfortunately, these documents are often affected by different age and storage related degradation which impinge on their readability and information contents. In this paper, we propose a document restoration method that removes the unwanted interfering degradation patterns from color ancient manuscripts. We exploit different color spaces to highlight the spectral differences in various layers of information usually present in these documents. At each image pixel, the spectral representations of all color spaces are stacked to form a feature vector. PCA is applied to the whole data cube to eliminate correlation of the color planes and enhance separation among the patterns. The reduced data cube, along with the pixel spatial information, is used to perform a pixel based segmentation, where each cluster represents a class of pixels that share similar color properties in the decorrelated color spaces. The interfering, unwanted classes can thus be removed by inpainting their pixels with the background texture. Assuming Gaussian distributions for the various classes, a Gaussian Mixture Model (GMM) is estimated through the Expectation Maximization (EM) algorithm from the data, and then used to find appropriate labels for each pixel. In order to preserve the original appearance of the document and reproduce the background texture, the detected degraded pixels are replaced based on Gaussian conditional simulation, according to the surrounding context. Experiments are shown on manuscripts affected by different kinds of degradations, including manuscripts from the DIBCO 2018 and 2019 publicaly available dataset. We observe that the use of a few PCA dominant components accelerates the clustering process and provides a more accurate segmentation.
古文献是历史和文明的丰富来源。不幸的是,这些文档经常受到不同年代和存储相关退化的影响,从而影响其可读性和信息内容。在本文中,我们提出了一种从彩色古文献中去除不需要的干扰退化模式的文档恢复方法。我们利用不同的颜色空间来突出显示这些文档中通常存在的各种信息层的光谱差异。在每个图像像素处,所有颜色空间的光谱表示都被堆叠在一起以形成特征向量。PCA 应用于整个数据立方体以消除颜色平面的相关性并增强模式之间的分离。经过降维的数据立方体,以及像素的空间信息,用于执行基于像素的分割,其中每个聚类代表具有相似颜色属性的一类像素在去相关颜色空间中。因此,可以通过用背景纹理填充这些像素来去除干扰的、不需要的类。假设各种类别的分布为高斯分布,通过 EM 算法从数据中估计出高斯混合模型 (GMM),然后将其用于为每个像素找到适当的标签。为了保持文档的原始外观并再现背景纹理,根据周围的上下文,通过高斯条件模拟替换检测到的退化像素。实验在受到不同类型退化影响的手稿上进行,包括来自 DIBCO 2018 和 2019 年公开可用数据集的手稿。我们观察到使用几个 PCA 主导成分可以加速聚类过程并提供更准确的分割。