Prashanth Duddela Sai, Mehta R Vasanth Kumar, Challa Nagendra Panini
SCSVMV Univesity, Kanchipuram 631561, India.
Sahyadri College of Engineering and Management, Mangaluru 575005, India.
Data Brief. 2021 Dec 16;40:107723. doi: 10.1016/j.dib.2021.107723. eCollection 2022 Feb.
This article presents handwritten isolated characters of the Devanagari script. Devanagari script contains ten numerals, 13 vowels, and 33 consonants. Devanagari Character dataset includes 23 different characters of numerals and vowels. 2400 handwritten samples are collected for each of the numerals and 1400 for each vowel. Collected samples are digitized and pre-processed. During pre-processing, images with noise are removed. In this context, a final dataset of 38,750 images were included, where 2,250 and 1,250 samples for each numeral and vowel, respectively. The data is available in images and comma-separated-values, along with attached labels. The dataset could be used for Optical Character Recognition research and deep learning. In India, the Devanagari script is the base script on which 120+ languages are evolved; hence this dataset serves as the base for Machine Learning research in these languages. The data set is publicly available at https://data.mendeley.com/datasets/pxrnvp4yy8/2.
本文展示了天城体手写孤立字符。天城体包含十个数字、13个元音和33个辅音。天城体字符数据集包括23个不同的数字和元音字符。每个数字收集了2400个手写样本,每个元音收集了1400个手写样本。收集到的样本进行了数字化和预处理。在预处理过程中,去除了有噪声的图像。在此背景下,最终数据集包含38750张图像,其中每个数字和元音分别有2250个和1250个样本。数据以图像和逗号分隔值的形式提供,并附有标签。该数据集可用于光学字符识别研究和深度学习。在印度,天城体是120多种语言演变的基础脚本;因此,该数据集是这些语言机器学习研究的基础。该数据集可在https://data.mendeley.com/datasets/pxrnvp4yy8/2上公开获取。