Sharma Arun, Satish Deepshikha, Sharma Sushmita, Gupta Dinesh
International Centre for Genetic Engineering and Biotechnology, Aruna Asaf Ali Marg, New Delhi 110067, India.
Data Brief. 2020 Oct 28;33:106460. doi: 10.1016/j.dib.2020.106460. eCollection 2020 Dec.
The dataset contains images of 10 out of 32 notified Indian basmati seeds varieties (by the Government of India). Indian basmati paddy varieties included in the dataset are 1121, 1509, 1637, 1718, 1728, BAS-370, CSR 30, Type-3/Dehraduni Basmati, PB-1 and PB-6. Moreover, several images of other seeds and related entities available in the household have also been included in the dataset. Thus, the dataset contains 11 classes such that ten classes contain images from ten different basmati paddy varieties. In contrast, the 11th class- named "Unknown" contains images from a mixture of two morphologically similar paddy varieties (1121 and 1509), different pulses, other grains and related food entities. The Unknown class is useful in discriminating the paddy seeds from other types of seeds and related food entities. All the images were captured (in standard conditions) manually using an apparatus developed and a tablet with a five-megapixel camera (5MP). The camera was used to capture 3210 RGB coloured images in JPG format. The data pre-processing was performed to generate the ready-to-use images for training and testing machine learning-based models. AI-based paddy seed variety classification models have been developed using the dataset. The dataset can be used to generate different types of AI-based models for adulteration detection, automated classification models (along with independent devices) at the time of rice threshing, and to increase the classification potential (Supplementing images representing additional basmati varieties).
该数据集包含印度政府通报的32个巴斯马蒂水稻品种中的10个品种的图像。数据集中包含的印度巴斯马蒂水稻品种有1121、1509、1637、1718、1728、BAS - 370、CSR 30、3号/德拉敦尼巴斯马蒂、PB - 1和PB - 6。此外,数据集中还包含了家庭中可获得的其他种子及相关实体的若干图像。因此,该数据集包含11个类别,其中10个类别包含来自10个不同巴斯马蒂水稻品种的图像。相比之下,第11类名为“未知”,包含来自两个形态相似的水稻品种(1121和1509)、不同豆类、其他谷物及相关食品实体的混合图像。未知类别有助于区分水稻种子与其他类型的种子及相关食品实体。所有图像均在标准条件下使用自行开发的设备和配备500万像素摄像头(5MP)的平板电脑手动拍摄。该摄像头用于拍摄3210张JPG格式的RGB彩色图像。进行了数据预处理,以生成用于训练和测试基于机器学习的模型的可用图像。已使用该数据集开发了基于人工智能的水稻种子品种分类模型。该数据集可用于生成不同类型的基于人工智能的掺假检测模型、水稻脱粒时的自动分类模型(连同独立设备),并提高分类潜力(补充代表其他巴斯马蒂品种的图像)。