School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
Department of Entomology and Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA.
Sci Rep. 2023 Jul 18;13(1):11566. doi: 10.1038/s41598-023-28734-6.
Deep learning (DL) based detection models are powerful tools for large-scale analysis of dynamic biological behaviors in video data. Supervised training of a DL detection model often requires a large amount of manually-labeled training data which are time-consuming and labor-intensive to acquire. In this paper, we propose LFAGPA (Learn From Algorithm-Generated Pseudo-Annotations) that utilizes (noisy) annotations which are automatically generated by algorithms to train DL models for ant detection in videos. Our method consists of two main steps: (1) generate foreground objects using a (set of) state-of-the-art foreground extraction algorithm(s); (2) treat the results from step (1) as pseudo-annotations and use them to train deep neural networks for ant detection. We tackle several challenges on how to make use of automatically generated noisy annotations, how to learn from multiple annotation resources, and how to combine algorithm-generated annotations with human-labeled annotations (when available) for this learning framework. In experiments, we evaluate our method using 82 videos (totally 20,348 image frames) captured under natural conditions in a tropical rain-forest for dynamic ant behavior study. Without any manual annotation cost but only algorithm-generated annotations, our method can achieve a decent detection performance (77% in [Formula: see text] score). Moreover, when using only 10% manual annotations, our method can train a DL model to perform as well as using the full human annotations (81% in [Formula: see text] score).
基于深度学习(DL)的检测模型是对视频数据中动态生物行为进行大规模分析的强大工具。DL 检测模型的监督训练通常需要大量手动标记的训练数据,这些数据的获取既耗时又费力。在本文中,我们提出了 LFAGPA(从算法生成的伪标签中学习),该方法利用(嘈杂的)算法自动生成的注释来训练用于视频中蚂蚁检测的 DL 模型。我们的方法包括两个主要步骤:(1)使用(一组)最先进的前景提取算法生成前景对象;(2)将步骤(1)的结果视为伪标签,并使用它们来训练用于蚂蚁检测的深度神经网络。我们解决了如何利用自动生成的嘈杂注释、如何从多个注释资源中学习以及如何将算法生成的注释与人工标记的注释(如果可用)结合到这个学习框架中的几个挑战。在实验中,我们使用在热带雨林中自然条件下拍摄的 82 个视频(总共 20348 个图像帧)来评估我们的方法,用于动态蚂蚁行为研究。在没有任何手动注释成本的情况下,仅使用算法生成的注释,我们的方法就可以实现相当不错的检测性能(在[Formula: see text]分数中达到 77%)。此外,当仅使用 10%的人工注释时,我们的方法可以训练 DL 模型,使其性能与使用完整的人工注释一样好(在[Formula: see text]分数中达到 81%)。