Rodríguez Paula, Parte Rubén, González Guillermo A, Gacho Alejandra, Santos Darío, Usamentiaga Rubén, Pedrayes Oscar D
Research Initiative Galicia & Asturias, NTT Data Spain, Enrique 6 Mariñas 36, A Coruña, 15009, Galicia, Spain.
Department of Computer Science, Engineering, University of Oviedo, Gijón, Spain.
Data Brief. 2025 May 2;60:111610. doi: 10.1016/j.dib.2025.111610. eCollection 2025 Jun.
Advancements in computer vision and deep learning have transformed ecological monitoring and species identification, enabling automated and accurate data labelling. Despite these advancements, robust AI-driven solutions for avian species recognition remain limited, primarily due to the scarcity of high-quality annotated datasets. To address this gap, this article introduces IBERBIRDS-a comprehensive and publicly accessible dataset specifically designed to facilitate automatic detection and classification of flying bird species in the Iberian Peninsula under real-world conditions. The dataset comprises 4000 images representing 10 ecologically significant medium to large-sized bird species, with each image annotated using bounding box coordinates in the YOLO detection format. Unlike existing datasets that typically feature close-up or ideal-condition imagery, IBERBIRDS focuses on mid-to-long range photographs of birds in flight, providing a more realistic and challenging representation of scenarios commonly encountered in birdwatching, conservation, and ecological monitoring. Images were sourced from publicly available, expert-validated ornithology platforms and underwent rigorous quality control to ensure annotation accuracy and consistency. This process included homogenizing color profiles and formats, as well as manual refinement to ensure that each image contains a single bird specimen. Additionally, detailed provenance and taxonomic metadata for each image has been systematically integrated into the dataset. The lack of pre-annotated datasets has significantly restricted large-scale ecological analysis and the development of automated techniques in avian research, hindering the progress of AI-driven solutions tailored for bird species recognition. By addressing this gap, this dataset serves as a comprehensive benchmark for avian studies, fostering advancements in various applications such as conservation initiatives, environmental impact assessments, biodiversity preservation strategies, real-time tracking systems, and video-based analysis. Additionally, IBERBIRDS constitutes a resource for computer vision applications, supporting educational programs tailored to ornithologists and birdwatching communities. By openly providing this dataset, IBERBIRDS promotes scientific collaboration and technological advancements, ultimately contributing to the preservation and understanding of avian biodiversity.
计算机视觉和深度学习的进步已经改变了生态监测和物种识别,实现了自动化和准确的数据标注。尽管有这些进步,但用于鸟类物种识别的强大的人工智能驱动解决方案仍然有限,主要原因是高质量标注数据集的稀缺。为了填补这一空白,本文介绍了IBERBIRDS——一个全面且可公开访问的数据集,专门设计用于在现实世界条件下促进伊比利亚半岛飞行鸟类物种的自动检测和分类。该数据集包含4000张图像,代表10种具有生态意义的中型到大型鸟类物种,每张图像都使用YOLO检测格式中的边界框坐标进行标注。与现有通常以特写或理想条件图像为特征的数据集不同,IBERBIRDS专注于飞行中鸟类的中远距离照片,提供了在观鸟、保护和生态监测中常见场景的更现实和更具挑战性的呈现。图像来自公开可用、经过专家验证的鸟类学平台,并经过严格的质量控制,以确保标注的准确性和一致性。这个过程包括统一颜色配置文件和格式,以及人工细化,以确保每张图像只包含一个鸟类标本。此外,每个图像的详细来源和分类元数据已被系统地整合到数据集中。缺乏预标注数据集严重限制了鸟类研究中的大规模生态分析和自动化技术的发展,阻碍了为鸟类物种识别量身定制的人工智能驱动解决方案的进展。通过填补这一空白,该数据集成为鸟类研究的全面基准,促进了保护倡议、环境影响评估、生物多样性保护策略、实时跟踪系统和基于视频的分析等各种应用的进步。此外,IBERBIRDS构成了计算机视觉应用的资源,支持为鸟类学家和观鸟社区量身定制的教育项目。通过公开提供这个数据集,IBERBIRDS促进了科学合作和技术进步,最终有助于保护和理解鸟类生物多样性。