Institut de Recherche pour le Developpement (IRD), UMR ENTROPIE (IRD, University of New-Caledonia, University of La Reunion, CNRS, Ifremer), 101 Promenade Roger Laroque, 98848 Noumea, France.
Sensors (Basel). 2022 Jan 10;22(2):497. doi: 10.3390/s22020497.
With the availability of low-cost and efficient digital cameras, ecologists can now survey the world's biodiversity through image sensors, especially in the previously rather inaccessible marine realm. However, the data rapidly accumulates, and ecologists face a data processing bottleneck. While computer vision has long been used as a tool to speed up image processing, it is only since the breakthrough of deep learning (DL) algorithms that the revolution in the automatic assessment of biodiversity by video recording can be considered. However, current applications of DL models to biodiversity monitoring do not consider some universal rules of biodiversity, especially rules on the distribution of species abundance, species rarity and ecosystem openness. Yet, these rules imply three issues for deep learning applications: the imbalance of long-tail datasets biases the training of DL models; scarce data greatly lessens the performances of DL models for classes with few data. Finally, the open-world issue implies that objects that are absent from the training dataset are incorrectly classified in the application dataset. Promising solutions to these issues are discussed, including data augmentation, data generation, cross-entropy modification, few-shot learning and open set recognition. At a time when biodiversity faces the immense challenges of climate change and the Anthropocene defaunation, stronger collaboration between computer scientists and ecologists is urgently needed to unlock the automatic monitoring of biodiversity.
随着低成本、高效率的数码相机的出现,生态学家现在可以通过图像传感器来调查世界生物多样性,特别是在以前难以到达的海洋领域。然而,数据迅速积累,生态学家面临着数据处理瓶颈。虽然计算机视觉早已被用作加速图像处理的工具,但直到深度学习(DL)算法取得突破,视频记录中生物多样性的自动评估才发生了革命性的变化。然而,目前深度学习模型在生物多样性监测中的应用并没有考虑到一些生物多样性的普遍规律,特别是关于物种丰度、物种稀有性和生态系统开放性的分布规律。然而,这些规律给深度学习应用带来了三个问题:长尾数据集的不平衡性使 DL 模型的训练产生偏差;数据匮乏极大地降低了数据较少类别的 DL 模型的性能。最后,开放世界问题意味着在训练数据集中不存在的物体在应用数据集中被错误分类。讨论了这些问题的有前途的解决方案,包括数据增强、数据生成、交叉熵修改、少样本学习和开放集识别。在生物多样性面临气候变化和人类世去动物群化的巨大挑战之际,迫切需要计算机科学家和生态学家之间更强的合作,以实现生物多样性的自动监测。