Mou Chao, Liang Aokang, Hu Chunying, Meng Fanyu, Han Baixun, Xu Fu
School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China.
Engineering Research Center for Forestry-oriented Intelligent Information Processing of National Forestry and Grassland Administration, Beijing 100083, China.
Animals (Basel). 2023 Oct 10;13(20):3168. doi: 10.3390/ani13203168.
Intelligent monitoring of endangered and rare wildlife is important for biodiversity conservation. In practical monitoring, few animal data are available to train recognition algorithms. The system must, therefore, achieve high accuracy with limited resources. Simultaneously, zoologists expect the system to be able to discover unknown species to make significant discoveries. To date, none of the current algorithms have these abilities. Therefore, this paper proposed a KI-CLIP method. Firstly, by first introducing CLIP, a foundation deep learning model that has not yet been applied in animal fields, the powerful recognition capability with few training resources is exploited with an additional shallow network. Secondly, inspired by the single-image recognition abilities of zoologists, we incorporate easily accessible expert description texts to improve performance with few samples. Finally, a simple incremental learning module is designed to detect unknown species. We conducted extensive comparative experiments, ablation experiments, and case studies on 12 datasets containing real data. The results validate the effectiveness of KI-CLIP, which can be trained on multiple real scenarios in seconds, achieving in our study over 90% recognition accuracy with only 8 training samples, and over 97% with 16 training samples. In conclusion, KI-CLIP is suitable for practical animal monitoring.
对濒危和珍稀野生动物进行智能监测对于生物多样性保护至关重要。在实际监测中,可用于训练识别算法的动物数据很少。因此,该系统必须在有限的资源条件下实现高精度。同时,动物学家期望该系统能够发现未知物种,从而做出重大发现。到目前为止,现有的算法都不具备这些能力。因此,本文提出了一种KI-CLIP方法。首先,通过首次引入尚未应用于动物领域的基础深度学习模型CLIP,并借助一个额外的浅层网络来利用其在少量训练资源下的强大识别能力。其次,受动物学家单图像识别能力的启发,我们纳入易于获取的专家描述文本,以在少量样本的情况下提高性能。最后,设计了一个简单的增量学习模块来检测未知物种。我们在包含真实数据的12个数据集上进行了广泛的对比实验、消融实验和案例研究。结果验证了KI-CLIP的有效性,它可以在数秒内在多个真实场景下进行训练,在我们的研究中,仅用8个训练样本就能达到超过90%的识别准确率,用16个训练样本则能达到超过97%的准确率。总之,KI-CLIP适用于实际的动物监测。