Shepley Andrew, Falzon Greg, Meek Paul, Kwan Paul
School of Science and Technology University of New England Armidale NSW Australia.
College of Science and Engineering Flinders University Adelaide SA Australia.
Ecol Evol. 2021 Mar 10;11(9):4494-4506. doi: 10.1002/ece3.7344. eCollection 2021 May.
A time-consuming challenge faced by camera trap practitioners is the extraction of meaningful data from images to inform ecological management. An increasingly popular solution is automated image classification software. However, most solutions are not sufficiently robust to be deployed on a large scale due to lack of location invariance when transferring models between sites. This prevents optimal use of ecological data resulting in significant expenditure of time and resources to annotate and retrain deep learning models.We present a method ecologists can use to develop optimized location invariant camera trap object detectors by (a) evaluating publicly available image datasets characterized by high intradataset variability in training deep learning models for camera trap object detection and (b) using small subsets of camera trap images to optimize models for high accuracy domain-specific applications.We collected and annotated three datasets of images of striped hyena, rhinoceros, and pigs, from the image-sharing websites FlickR and iNaturalist (FiN), to train three object detection models. We compared the performance of these models to that of three models trained on the Wildlife Conservation Society and Camera CATalogue datasets, when tested on out-of-sample Snapshot Serengeti datasets. We then increased FiN model robustness by infusing small subsets of camera trap images into training.In all experiments, the mean Average Precision (mAP) of the FiN trained models was significantly higher (82.33%-88.59%) than that achieved by the models trained only on camera trap datasets (38.5%-66.74%). Infusion further improved mAP by 1.78%-32.08%.Ecologists can use FiN images for training deep learning object detection solutions for camera trap image processing to develop location invariant, robust, out-of-the-box software. Models can be further optimized by infusion of 5%-10% camera trap images into training data. This would allow AI technologies to be deployed on a large scale in ecological applications. Datasets and code related to this study are open source and available on this repository: https://doi.org/10.5061/dryad.1c59zw3tx.
相机陷阱从业者面临的一项耗时挑战是从图像中提取有意义的数据以指导生态管理。一个越来越流行的解决方案是自动化图像分类软件。然而,由于在不同地点之间转移模型时缺乏位置不变性,大多数解决方案的鲁棒性不足以大规模部署。这阻碍了生态数据的最佳利用,导致在注释和重新训练深度学习模型方面花费大量时间和资源。我们提出了一种方法,生态学家可以通过以下方式来开发优化的位置不变相机陷阱目标检测器:(a)评估公开可用的图像数据集,这些数据集在训练用于相机陷阱目标检测的深度学习模型时具有高数据集内变异性;(b)使用相机陷阱图像的小子集来优化模型以用于高精度的特定领域应用。我们从图像分享网站FlickR和iNaturalist(FiN)收集并注释了三个关于条纹鬣狗、犀牛和猪的图像数据集,以训练三个目标检测模型。当在样本外的塞伦盖蒂快照数据集上进行测试时,我们将这些模型的性能与在野生动物保护协会和相机目录数据集上训练的三个模型的性能进行了比较。然后,我们通过将相机陷阱图像的小子集注入训练来提高FiN模型的鲁棒性。在所有实验中,FiN训练模型的平均平均精度(mAP)显著高于仅在相机陷阱数据集上训练的模型(82.33%-88.59%对38.5%-66.74%)。注入进一步将mAP提高了1.78%-32.08%。生态学家可以使用FiN图像来训练用于相机陷阱图像处理的深度学习目标检测解决方案,以开发位置不变、鲁棒的开箱即用软件。通过将5%-10%的相机陷阱图像注入训练数据,可以进一步优化模型。这将使人工智能技术能够在生态应用中大规模部署。与本研究相关的数据集和代码是开源的,可在这个存储库中获取:https://doi.org/10.5061/dryad.1c59zw3tx。