Barkmann Friederike, Lindner Andreas, Würflinger Ronald, Höttinger Helmut, Rüdisser Johannes
Department of Ecology, University of Innsbruck, Innsbruck, Austria.
Advanced Computing Austria ACA GmbH, Wien, Austria.
Sci Data. 2025 Aug 6;12(1):1369. doi: 10.1038/s41597-025-05708-z.
Deep learning models can accelerate the processing of image-based biodiversity data and provide educational value by giving direct feedback to citizen scientists. However, the training of such models requires large amounts of labelled data and not all species are equally suited for identification from images alone. Most butterfly and many moth species (Lepidoptera) which play an important role as biodiversity indicators are well-suited for such approaches. This dataset contains over 540.000 images of 185 butterfly and moth species that occur in Austria. Images were collected by citizen scientists with the application "Schmetterlinge Österreichs" and correct species identification was ensured by an experienced entomologist. The number of images per species ranges from one to nearly 30.000. Such a strong class imbalance is common in datasets of species records. The dataset is larger than other published dataset of butterfly and moth images and offers opportunities for the training and evaluation of machine learning models on the fine-grained classification task of species identification.
深度学习模型可以加速基于图像的生物多样性数据的处理,并通过向公民科学家提供直接反馈来提供教育价值。然而,此类模型的训练需要大量的标记数据,而且并非所有物种都同样适合仅从图像中进行识别。作为生物多样性指标发挥重要作用的大多数蝴蝶和许多蛾类物种(鳞翅目)非常适合此类方法。该数据集包含奥地利境内出现的185种蝴蝶和蛾类物种的超过540000张图像。这些图像由公民科学家通过“奥地利蝴蝶”应用程序收集,并由一位经验丰富的昆虫学家确保物种识别正确。每个物种的图像数量从1张到近30000张不等。这种强烈的类别不平衡在物种记录数据集中很常见。该数据集比其他已发布的蝴蝶和蛾类图像数据集更大,并为机器学习模型在物种识别的细粒度分类任务上的训练和评估提供了机会。