Schneider Stefan, Greenberg Saul, Taylor Graham W, Kremer Stefan C
School of Computer Science University of Guelph Guelph ON Canada.
Department of Computer Science University of Calgary Calgary AB Canada.
Ecol Evol. 2020 Mar 7;10(7):3503-3517. doi: 10.1002/ece3.6147. eCollection 2020 Apr.
Ecological camera traps are increasingly used by wildlife biologists to unobtrusively monitor an ecosystems animal population. However, manual inspection of the images produced is expensive, laborious, and time-consuming. The success of deep learning systems using camera trap images has been previously explored in preliminary stages. These studies, however, are lacking in their practicality. They are primarily focused on extremely large datasets, often millions of images, and there is little to no focus on performance when tasked with species identification in new locations not seen during training. Our goal was to test the capabilities of deep learning systems trained on camera trap images using modestly sized training data, compare performance when considering unseen background locations, and quantify the gradient of lower bound performance to provide a guideline of data requirements in correspondence to performance expectations. We use a dataset provided by Parks Canada containing 47,279 images collected from 36 unique geographic locations across multiple environments. Images represent 55 animal species and human activity with high-class imbalance. We trained, tested, and compared the capabilities of six deep learning computer vision networks using transfer learning and image augmentation: DenseNet201, Inception-ResNet-V3, InceptionV3, NASNetMobile, MobileNetV2, and Xception. We compare overall performance on "trained" locations where DenseNet201 performed best with 95.6% top-1 accuracy showing promise for deep learning methods for smaller scale research efforts. Using trained locations, classifications with <500 images had low and highly variable recall of 0.750 ± 0.329, while classifications with over 1,000 images had a high and stable recall of 0.971 ± 0.0137. Models tasked with classifying species from untrained locations were less accurate, with DenseNet201 performing best with 68.7% top-1 accuracy. Finally, we provide an open repository where ecologists can insert their image data to train and test custom species detection models for their desired ecological domain.
生态相机陷阱越来越多地被野生动物生物学家用于在不干扰的情况下监测生态系统中的动物种群。然而,人工检查所生成的图像既昂贵、费力又耗时。此前已经在初步阶段探索了使用相机陷阱图像的深度学习系统的成效。然而,这些研究在实用性方面存在欠缺。它们主要聚焦于极其庞大的数据集,通常有数百万张图像,并且在面对在训练期间未见过的新地点进行物种识别任务时,几乎没有关注性能表现。我们的目标是测试使用适度规模训练数据在相机陷阱图像上训练的深度学习系统的能力,比较在考虑未见过的背景地点时的性能,并量化下限性能的梯度,以提供与性能期望相对应的数据需求指南。我们使用了加拿大公园管理局提供的一个数据集,该数据集包含从多个环境中的36个独特地理位置收集的47279张图像。图像代表55种动物物种以及人类活动,类别不均衡程度较高。我们使用迁移学习和图像增强技术训练、测试并比较了六个深度学习计算机视觉网络的能力:DenseNet201、Inception-ResNet-V3、InceptionV3、NASNetMobile、MobileNetV2和Xception。我们比较了在“训练过的”地点上的整体性能,其中DenseNet201表现最佳,top-1准确率达到95.6%,这表明深度学习方法在小规模研究工作中有应用前景。在使用训练过的地点时,图像数量少于500张的分类召回率较低且波动很大,为0.750 ± 0.329,而图像数量超过1000张的分类召回率较高且稳定,为0.971 ± 0.0137。负责对未训练地点的物种进行分类的模型准确性较低,DenseNet201表现最佳,top-1准确率为68.7%。最后,我们提供了一个开放的资源库,生态学家可以在其中插入他们的图像数据,以训练和测试针对其所需生态领域的自定义物种检测模型。