Department of Mathematics, Computer Science and Physics, Università degli Studi di Udine, Via delle Scienze 206, 33100 Udine, Italy.
Department of Humanities and Cultural Heritage, Università degli Studi di Udine, Vicolo Florio 2/b, 33100 Udine, Italy.
Int J Neural Syst. 2023 Oct;33(10):2350052. doi: 10.1142/S0129065723500521. Epub 2023 Aug 10.
Over the years, the humanities community has increasingly requested the creation of artificial intelligence frameworks to help the study of cultural heritage. Document Layout segmentation, which aims at identifying the different structural components of a document page, is a particularly interesting task connected to this trend, specifically when it comes to handwritten texts. While there are many effective approaches to this problem, they all rely on large amounts of data for the training of the underlying models, which is rarely possible in a real-world scenario, as the process of producing the ground truth segmentation task with the required precision to the pixel level is a very time-consuming task and often requires a certain degree of domain knowledge regarding the documents at hand. For this reason, in this paper, we propose an effective few-shot learning framework for document layout segmentation relying on two novel components, namely a dynamic instance generation and a segmentation refinement module. This approach is able of achieving performances comparable to the current state of the art on the popular Diva-HisDB dataset, while relying on just a fraction of the available data.
多年来,人文学科领域越来越要求创建人工智能框架来帮助研究文化遗产。文档布局分割旨在识别文档页面的不同结构组件,这是一个特别有趣的任务,尤其是在手写文本方面。虽然有许多有效的方法可以解决这个问题,但它们都依赖于大量数据来训练底层模型,这在实际情况下很少可能实现,因为以像素级的精度生成所需的地面真实分割任务的过程是一个非常耗时的任务,并且通常需要对所处理的文档有一定程度的领域知识。出于这个原因,在本文中,我们提出了一种基于两个新颖组件的有效少样本学习框架,用于文档布局分割,这两个组件分别是动态实例生成和分割细化模块。这种方法能够在仅依赖一小部分可用数据的情况下,在流行的 Diva-HisDB 数据集上实现与当前最先进技术相媲美的性能。