Suppr超能文献

一个包含孤立数字和元音的梵文字母多用途数据集。

A multi-purpose dataset of Devanagari script comprising of isolated numerals and vowels.

作者信息

Prashanth Duddela Sai, Mehta R Vasanth Kumar, Challa Nagendra Panini

机构信息

SCSVMV Univesity, Kanchipuram 631561, India.

Sahyadri College of Engineering and Management, Mangaluru 575005, India.

出版信息

Data Brief. 2021 Dec 16;40:107723. doi: 10.1016/j.dib.2021.107723. eCollection 2022 Feb.

Abstract

This article presents handwritten isolated characters of the Devanagari script. Devanagari script contains ten numerals, 13 vowels, and 33 consonants. Devanagari Character dataset includes 23 different characters of numerals and vowels. 2400 handwritten samples are collected for each of the numerals and 1400 for each vowel. Collected samples are digitized and pre-processed. During pre-processing, images with noise are removed. In this context, a final dataset of 38,750 images were included, where 2,250 and 1,250 samples for each numeral and vowel, respectively. The data is available in images and comma-separated-values, along with attached labels. The dataset could be used for Optical Character Recognition research and deep learning. In India, the Devanagari script is the base script on which 120+ languages are evolved; hence this dataset serves as the base for Machine Learning research in these languages. The data set is publicly available at https://data.mendeley.com/datasets/pxrnvp4yy8/2.

摘要

本文展示了天城体手写孤立字符。天城体包含十个数字、13个元音和33个辅音。天城体字符数据集包括23个不同的数字和元音字符。每个数字收集了2400个手写样本,每个元音收集了1400个手写样本。收集到的样本进行了数字化和预处理。在预处理过程中,去除了有噪声的图像。在此背景下,最终数据集包含38750张图像,其中每个数字和元音分别有2250个和1250个样本。数据以图像和逗号分隔值的形式提供,并附有标签。该数据集可用于光学字符识别研究和深度学习。在印度,天城体是120多种语言演变的基础脚本;因此,该数据集是这些语言机器学习研究的基础。该数据集可在https://data.mendeley.com/datasets/pxrnvp4yy8/2上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c1d2/8713117/8d1edc1120fd/gr1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验