• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

HCCD:一个用于在各种退化条件下进行文档增强的手写相机捕获数据集。

HCCD: A handwritten camera-captured dataset for document enhancement under varied degradation conditions.

作者信息

Koushik K S, B J Bipin Nair, Rani N Shobha

机构信息

Department of Computer Science, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Mysuru, India.

Department of Artificial Intelligence and Data Science, GITAM School of Technology, Bengaluru, GITAM (Deemed to be) University, India.

出版信息

Data Brief. 2025 Jul 2;61:111849. doi: 10.1016/j.dib.2025.111849. eCollection 2025 Aug.

DOI:10.1016/j.dib.2025.111849
PMID:40697364
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12281058/
Abstract

Enhancing degraded handwritten documents captured with smartphone cameras remains a significant challenge in document analysis. Although deep learning-based enhancement techniques have shown promise, the performance of deep learning models largely relies on the availability of meticulously labeled ground truth datasets. To address this gap, in this study, the Handwritten Camera-Captured Dataset (HCCD) is introduced to support document enhancement and recognition tasks specific to real-world scenarios. Unlike existing datasets, which are captured in controlled environments with scanners or smartphone cameras, HCCD features real-time, camera-captured handwritten documents exhibiting a range of natural degradations. The degradation issues encompass motion blur, shadow artifacts, and uneven lighting, which reflect challenges incurred in the real-life document digitization process. In the proposed dataset, each handwritten document is paired with a high-quality enhanced image created through a combination of computer vision-based imaging techniques. The documents are in Roman script and were contributed by multiple individuals with varying handwriting styles. The dataset is valuable for machine learning/ deep learning-based training for image restoration, denoising, and OCR applications. Each sample is annotated with rich metadata for further targeted research, including degradation type, severity level, and writer-specific demographics.

摘要

增强用智能手机摄像头拍摄的退化手写文档仍然是文档分析中的一项重大挑战。尽管基于深度学习的增强技术已展现出前景,但深度学习模型的性能在很大程度上依赖于精心标注的地面真值数据集的可用性。为了弥补这一差距,在本研究中,引入了手写相机拍摄数据集(HCCD)来支持特定于现实世界场景的文档增强和识别任务。与现有的在受控环境中使用扫描仪或智能手机摄像头捕获的数据集不同,HCCD的特点是实时、由相机拍摄的手写文档,呈现出一系列自然退化情况。退化问题包括运动模糊、阴影伪影和光照不均,这些反映了现实生活中文档数字化过程中遇到的挑战。在所提出的数据集中,每个手写文档都与通过基于计算机视觉的成像技术组合创建的高质量增强图像配对。文档采用罗马字母书写,由多个具有不同书写风格的个人提供。该数据集对于基于机器学习/深度学习的图像恢复、去噪和光学字符识别(OCR)应用训练很有价值。每个样本都带有丰富的元数据,用于进一步的针对性研究,包括退化类型、严重程度级别和作者特定的人口统计学信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b882/12281058/ab07e7a25537/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b882/12281058/2bd005ac0e99/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b882/12281058/381bebbd3f3f/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b882/12281058/611bf3f2122e/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b882/12281058/ab07e7a25537/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b882/12281058/2bd005ac0e99/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b882/12281058/381bebbd3f3f/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b882/12281058/611bf3f2122e/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b882/12281058/ab07e7a25537/gr4.jpg

相似文献

1
HCCD: A handwritten camera-captured dataset for document enhancement under varied degradation conditions.HCCD:一个用于在各种退化条件下进行文档增强的手写相机捕获数据集。
Data Brief. 2025 Jul 2;61:111849. doi: 10.1016/j.dib.2025.111849. eCollection 2025 Aug.
2
Management of urinary stones by experts in stone disease (ESD 2025).结石病专家对尿路结石的管理(2025年结石病专家共识)
Arch Ital Urol Androl. 2025 Jun 30;97(2):14085. doi: 10.4081/aiua.2025.14085.
3
Preserving noise texture through training data curation for deep learning denoising of high-resolution cardiac EID-CT.通过训练数据精选来保留噪声纹理,用于高分辨率心脏EID-CT的深度学习去噪
Med Phys. 2025 Jul;52(7):e17938. doi: 10.1002/mp.17938.
4
Cauliflower leaf diseases: A computer vision dataset for smart agriculture.花椰菜叶部病害:一个用于智慧农业的计算机视觉数据集。
Data Brief. 2025 Apr 28;60:111594. doi: 10.1016/j.dib.2025.111594. eCollection 2025 Jun.
5
Advancing respiratory disease diagnosis: A deep learning and vision transformer-based approach with a novel X-ray dataset.推进呼吸系统疾病诊断:一种基于深度学习和视觉Transformer的方法及新型X射线数据集
Comput Biol Med. 2025 Aug;194:110501. doi: 10.1016/j.compbiomed.2025.110501. Epub 2025 Jun 9.
6
Short-Term Memory Impairment短期记忆障碍
7
Facial Emotion Recognition of 16 Distinct Emotions From Smartphone Videos: Comparative Study of Machine Learning and Human Performance.基于智能手机视频的16种不同情绪的面部表情识别:机器学习与人类表现的对比研究
J Med Internet Res. 2025 Jul 2;27:e68942. doi: 10.2196/68942.
8
ECG-Image-Kit: a synthetic image generation toolbox to facilitate deep learning-based electrocardiogram digitization.ECG-Image-Kit:一个用于辅助基于深度学习的心电图数字化的合成图像生成工具包。
Physiol Meas. 2024 May 28;45(5):055019. doi: 10.1088/1361-6579/ad4954.
9
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。
Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.
10
A medical image classification method based on self-regularized adversarial learning.基于自正则化对抗学习的医学图像分类方法。
Med Phys. 2024 Nov;51(11):8232-8246. doi: 10.1002/mp.17320. Epub 2024 Jul 30.

本文引用的文献

1
AMDPWE: Alphonso Mango Dataset for Precision Weight Estimation.AMDPWE:用于精确重量估计的阿方索芒果数据集。
Data Brief. 2023 Nov 7;51:109778. doi: 10.1016/j.dib.2023.109778. eCollection 2023 Dec.
2
DIMPSAR: Dataset for Indian medicinal plant species analysis and recognition.DIMPSAR:用于印度药用植物物种分析与识别的数据集。
Data Brief. 2023 Jul 14;49:109388. doi: 10.1016/j.dib.2023.109388. eCollection 2023 Aug.
3
HMPLMD: Handwritten Malayalam palm leaf manuscript dataset.HMPLMD:马拉雅拉姆语手写棕榈叶手稿数据集。
Data Brief. 2023 Feb 8;47:108960. doi: 10.1016/j.dib.2023.108960. eCollection 2023 Apr.