Islam Mohammad Manzurul, Ahmed Md Jubayer, Shafi Mahmud Bin, Das Aritra, Hasan Md Rakibul, Rafi Abdullah Al, Rashid Mohammad Rifat Ahmmad, Niloy Nishat Tasnim, Ali Md Sawkat, Chowdhury Abdullahi, Rasel Ahmed Abdal Shafi
Department of Computer Science and Engineering, East West University, Aftabnagar, Dhaka, Bangladesh.
Data Brief. 2024 Dec 19;58:111241. doi: 10.1016/j.dib.2024.111241. eCollection 2025 Feb.
In the field of agriculture, particularly within the context of machine learning applications, quality datasets are essential for advancing research and development. To address the challenges of identifying different mango leaf types and recognizing the diverse and unique characteristics of mango varieties in Bangladesh, a comprehensive and publicly accessible dataset titled "BDMANGO" has been created. This dataset includes images essential for research, featuring six mango varieties: Amrapali, Banana, Chaunsa, Fazli, Haribhanga, and Himsagar, which were collected from different locations. The images were captured using the rear cameras of a Google Pixel 6a and an iPhone XR and were stored in 640 × 480 pixels resolution. Both sides of each mango leaf were photographed against white background to accurately reflect real-world scenarios in mango cultivation fields. The white background was specifically chosen to remove noise in image sample, allowing for accurate feature extraction by machine learning algorithms. This will ensure the trained model's efficacy in identifying a specific mango leaf while implemented alongside any segmentation algorithm. Additionally, image augmentation techniques such as rotation, horizontal flip, vertical flip, width shift, height shift, shear range, and zooming were applied to expand the dataset from 837 original images to a total of 6696 images (837 original image and 5859 augmented images). This expansion significantly enhances the dataset's utility for training, testing, and validating machine learning models designed for classifying mango leaf varieties, thereby supporting research efforts in this domain.
在农业领域,特别是在机器学习应用的背景下,高质量的数据集对于推动研究与开发至关重要。为应对识别不同芒果叶类型以及识别孟加拉国芒果品种多样且独特特征的挑战,创建了一个名为“BDMANGO”的全面且可公开访问的数据集。该数据集包含研究所需的图像,有六个芒果品种:阿姆拉普利、香蕉、乔恩萨、法兹利、哈里班加和希姆萨加尔,这些图像是从不同地点收集的。图像使用谷歌Pixel 6a和iPhone XR的后置摄像头拍摄,存储分辨率为640×480像素。每张芒果叶的两面都以白色背景拍摄,以准确反映芒果种植园的真实场景。特意选择白色背景是为了去除图像样本中的噪声,以便机器学习算法进行准确的特征提取。这将确保在与任何分割算法一起实施时,训练模型在识别特定芒果叶方面的有效性。此外,还应用了旋转、水平翻转、垂直翻转、宽度偏移、高度偏移、剪切范围和缩放等图像增强技术,将数据集从837张原始图像扩展到总共6696张图像(837张原始图像和5859张增强图像)。这种扩展显著提高了数据集在训练、测试和验证用于分类芒果叶品种的机器学习模型方面的效用,从而支持该领域的研究工作。