Arrechea-Castillo Darwin Alexis, Espitia-Buitrago Paula, Florian-Vargas David, Arboleda Ronald David, Velázquez-Hernández Riquelmer, Ruiz-Hurtado Andrés Felipe, Hernandez Luis Miguel, Jauregui Rosa N, Cardoso Juan Andrés
International Center for Tropical Agriculture (CIAT), A.A. 6713, Km 17 recta Cali-Palmira, Palmira, Colombia.
Grupo Papalotla, C.P, Ocozocoautla - Cintalapa km 110, Col. El Aguacero, Ocozocoautla de Espinosa, Chiapas 29140, México.
Data Brief. 2025 Apr 28;60:111593. doi: 10.1016/j.dib.2025.111593. eCollection 2025 Jun.
This dataset is an expanded version of a previously published collection of high-resolution RGB images of genotypes, initially designed to facilitate automated classification of phenological stages and raceme identification in forage breeding trials. The original dataset included 2400 images of 200 genotypes captured under controlled conditions, supporting the development of computer vision models for High-Throughput Phenotyping (HTP). In this updated release, 139 additional images and 24,983 new annotations have been added, bringing the dataset to a total of 2539 images and 47,323 raceme annotations. This version introduces increased diversity in image-capture conditions, with data collected from two geographic locations (Palmira, Colombia, and Ocozocoautla de Espinosa, Mexico) and a range of image-capture devices, including smartphones (e.g. Realme C53 and Oppo Reno 11), a Nikon D5600 camera, and a Phantom 4 Pro V2 drone. Images now vary in perspective (nadir, high-angle, and frontal) and capture distance (1-3 meters), enhancing the dataset applicability for robust Deep Learning (DL) models. Compared to the original dataset, raceme density per plant has nearly doubled in some samples, offering higher raceme overlap for advanced instance segmentation tasks. This expanded dataset supports deeper exploration of phenotypic variation in spp. and offers greater potential for developing adaptable models in crop phenotyping.
该数据集是先前发布的一组基因型高分辨率RGB图像的扩展版本,最初旨在促进饲料育种试验中物候期的自动分类和总状花序识别。原始数据集包括在受控条件下拍摄的200个基因型的2400张图像,支持高通量表型分析(HTP)计算机视觉模型的开发。在这个更新版本中,增加了139张额外的图像和24983个新注释,使数据集总数达到2539张图像和47323个总状花序注释。这个版本在图像采集条件方面增加了多样性,数据来自两个地理位置(哥伦比亚的帕尔米拉和墨西哥的奥科佐科奥特拉·德埃斯皮诺萨)以及一系列图像采集设备,包括智能手机(如真我C53和OPPO Reno 11)、尼康D5600相机和大疆精灵4 Pro V2无人机。图像现在在视角(天底、高角度和正面)和拍摄距离(1 - 3米)上有所不同,增强了数据集对强大深度学习(DL)模型的适用性。与原始数据集相比,某些样本中每株植物的总状花序密度几乎增加了一倍,为高级实例分割任务提供了更高的总状花序重叠度。这个扩展后的数据集支持对 spp. 表型变异进行更深入的探索,并为作物表型分析中开发适应性模型提供了更大潜力。