Suppr超能文献

使用卷积网络的有效特征迁移,实现昆虫自动分类,达到专家级别的准确性。

Automated Taxonomic Identification of Insects with Expert-Level Accuracy Using Effective Feature Transfer from Convolutional Networks.

机构信息

Savantic AB, Rosenlundsgatan 52, 118 63 Stockholm, Sweden.

Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Frescativagen 40, 114 18 Stockholm, Sweden.

出版信息

Syst Biol. 2019 Nov 1;68(6):876-895. doi: 10.1093/sysbio/syz014.

Abstract

Rapid and reliable identification of insects is important in many contexts, from the detection of disease vectors and invasive species to the sorting of material from biodiversity inventories. Because of the shortage of adequate expertise, there has long been an interest in developing automated systems for this task. Previous attempts have been based on laborious and complex handcrafted extraction of image features, but in recent years it has been shown that sophisticated convolutional neural networks (CNNs) can learn to extract relevant features automatically, without human intervention. Unfortunately, reaching expert-level accuracy in CNN identifications requires substantial computational power and huge training data sets, which are often not available for taxonomic tasks. This can be addressed using feature transfer: a CNN that has been pretrained on a generic image classification task is exposed to the taxonomic images of interest, and information about its perception of those images is used in training a simpler, dedicated identification system. Here, we develop an effective method of CNN feature transfer, which achieves expert-level accuracy in taxonomic identification of insects with training sets of 100 images or less per category, depending on the nature of data set. Specifically, we extract rich representations of intermediate to high-level image features from the CNN architecture VGG16 pretrained on the ImageNet data set. This information is submitted to a linear support vector machine classifier, which is trained on the target problem. We tested the performance of our approach on two types of challenging taxonomic tasks: 1) identifying insects to higher groups when they are likely to belong to subgroups that have not been seen previously and 2) identifying visually similar species that are difficult to separate even for experts. For the first task, our approach reached $CDATA[$CDATA[$>$$92% accuracy on one data set (884 face images of 11 families of Diptera, all specimens representing unique species), and $CDATA[$CDATA[$>$$96% accuracy on another (2936 dorsal habitus images of 14 families of Coleoptera, over 90% of specimens belonging to unique species). For the second task, our approach outperformed a leading taxonomic expert on one data set (339 images of three species of the Coleoptera genus Oxythyrea; 97% accuracy), and both humans and traditional automated identification systems on another data set (3845 images of nine species of Plecoptera larvae; 98.6 % accuracy). Reanalyzing several biological image identification tasks studied in the recent literature, we show that our approach is broadly applicable and provides significant improvements over previous methods, whether based on dedicated CNNs, CNN feature transfer, or more traditional techniques. Thus, our method, which is easy to apply, can be highly successful in developing automated taxonomic identification systems even when training data sets are small and computational budgets limited. We conclude by briefly discussing some promising CNN-based research directions in morphological systematics opened up by the success of these techniques in providing accurate diagnostic tools.

摘要

快速可靠地识别昆虫在许多情况下都很重要,从检测疾病媒介和入侵物种到对生物多样性清单中的材料进行分类。由于缺乏足够的专业知识,人们长期以来一直对开发用于此任务的自动化系统感兴趣。以前的尝试是基于对图像特征的艰苦而复杂的手工提取,但近年来已经表明,复杂的卷积神经网络(CNN)可以在没有人工干预的情况下自动学习提取相关特征。不幸的是,要达到 CNN 识别的专家级准确性,需要大量的计算能力和庞大的训练数据集,而这些数据集通常不适用于分类任务。这可以通过特征转移来解决:已经在通用图像分类任务上进行预训练的 CNN 会接触到感兴趣的分类图像,并且会使用有关其对这些图像的感知的信息来训练更简单、专用的识别系统。在这里,我们开发了一种有效的 CNN 特征转移方法,该方法在每个类别 100 张或更少的训练集的情况下,在昆虫的分类识别中达到了专家级的准确性,具体取决于数据集的性质。具体来说,我们从在 ImageNet 数据集上进行预训练的 CNN 架构 VGG16 中提取丰富的中级到高级图像特征表示。将此信息提交给针对目标问题进行训练的线性支持向量机分类器。我们在两种具有挑战性的分类任务上测试了我们方法的性能:1)当它们可能属于以前未见过的子组时,将昆虫识别为更高的群体,以及 2)识别即使对于专家也难以区分的视觉相似物种。对于第一个任务,我们的方法在一个数据集上达到了 $CDATA[$CDATA[$>$$92%的准确率(11 个蝇科的 884 张面部图像,所有标本均代表独特的物种),在另一个数据集上达到了 $CDATA[$CDATA[$>$$96%的准确率(14 个鞘翅目科的 2936 张背侧姿势图像,超过 90%的标本属于独特的物种)。对于第二个任务,我们的方法在一个数据集上的表现优于一位领先的分类专家(3 种鞘翅目 Oxythyrea 属的 339 张图像;准确率为 97%),并且在另一个数据集上的表现优于人类和传统的自动识别系统(98.6%准确率,9 种 Plecoptera 幼虫的 3845 张图像)。重新分析最近文献中研究的几个生物图像识别任务,我们表明我们的方法具有广泛的适用性,并在基于专用 CNN、CNN 特征转移或更传统技术的方法方面提供了显著的改进。因此,即使在训练数据集较小且计算预算有限的情况下,我们易于应用的方法也可以在开发自动分类识别系统方面取得巨大成功。最后,我们简要讨论了这些技术在提供准确诊断工具方面取得成功后,在形态系统学中基于 CNN 的一些有前途的研究方向。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4da6/6802574/540e6ca485b8/syz014f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验