用于中医文献增强的联合变化与诸因数据集。

Joint variation and ZhuYin dataset for Traditional Chinese document enhancement.

作者信息

Lo Shi-Wei, Chou Hsiu-Mei, Wu Jyh-Horng

机构信息

National Center for High-Performance Computing, Hsinchu, Taiwan.

出版信息

Sci Data. 2024 Nov 27;11(1):1295. doi: 10.1038/s41597-024-04146-7.

DOI:10.1038/s41597-024-04146-7

PMID:39604400

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11603144/

Abstract

Digital documents play a crucial role in contemporary information management. However, their quality can be significantly impacted by various factors such as hand-drawn annotations, image distortion, watermarks, stains, and degradation. Deep learning-based methods have emerged as powerful tools for document enhancement. However, their effectiveness relies heavily on the availability of high-quality training and evaluation datasets. Unfortunately, such benchmark datasets are relatively scarce, particularly in the domain of Traditional Chinese documents. We introduce a novel dataset termed "Joint Variation and ZhuYin dataset (JVZY)" to address this gap. This dataset comprises 20,000 images and 1.92 million words, encompassing various document degradation characteristics. It also includes unique phonetic symbols in Traditional Chinese, catering to the specific localization requirements. By releasing this dataset, we aim to construct a continuously evolving resource explicitly tailored to the diverse needs of Traditional Chinese document enhancement. This resource aims to facilitate the development of applications that can effectively address the challenges posed by unique phonetic symbols and varied file degradation characteristics encountered in Traditional Chinese documents.

摘要

数字文档在当代信息管理中发挥着至关重要的作用。然而，它们的质量会受到各种因素的显著影响，如手绘注释、图像失真、水印、污渍和退化。基于深度学习的方法已成为文档增强的强大工具。然而，它们的有效性在很大程度上依赖于高质量训练和评估数据集的可用性。不幸的是，这样的基准数据集相对稀缺，尤其是在繁体中文文档领域。我们引入了一个名为“联合变异与注音数据集（JVZY）”的新型数据集来填补这一空白。该数据集包含20000张图像和192万个单词，涵盖了各种文档退化特征。它还包括繁体中文中的独特音标，以满足特定的本地化需求。通过发布这个数据集，我们旨在构建一个不断发展的资源，明确针对繁体中文文档增强的多样化需求进行定制。这个资源旨在促进能够有效应对繁体中文文档中独特音标和各种文件退化特征所带来挑战的应用程序的开发。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于中医文献增强的联合变化与诸因数据集。

Joint variation and ZhuYin dataset for Traditional Chinese document enhancement.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

用于中医文献增强的联合变化与诸因数据集。

Joint variation and ZhuYin dataset for Traditional Chinese document enhancement.

作者信息

机构信息

出版信息

相似文献

引用本文的文献