Suppr超能文献

在大陆尺度上对陷阱收集的步甲进行稳健且简化的机器学习识别。

Robust and simplified machine learning identification of pitfall trap-collected ground beetles at the continental scale.

作者信息

Blair Jarrett, Weiser Michael D, Kaspari Michael, Miller Matthew, Siler Cameron, Marshall Katie E

机构信息

Department of Zoology University of British Columbia Vancouver BC Canada.

Department of Biology University of Oklahoma Norman OK USA.

出版信息

Ecol Evol. 2020 Nov 11;10(23):13143-13153. doi: 10.1002/ece3.6905. eCollection 2020 Dec.

Abstract

Insect populations are changing rapidly, and monitoring these changes is essential for understanding the causes and consequences of such shifts. However, large-scale insect identification projects are time-consuming and expensive when done solely by human identifiers. Machine learning offers a possible solution to help collect insect data quickly and efficiently.Here, we outline a methodology for training classification models to identify pitfall trap-collected insects from image data and then apply the method to identify ground beetles (Carabidae). All beetles were collected by the National Ecological Observatory Network (NEON), a continental scale ecological monitoring project with sites across the United States. We describe the procedures for image collection, image data extraction, data preparation, and model training, and compare the performance of five machine learning algorithms and two classification methods (hierarchical vs. single-level) identifying ground beetles from the species to subfamily level. All models were trained using pre-extracted feature vectors, not raw image data. Our methodology allows for data to be extracted from multiple individuals within the same image thus enhancing time efficiency, utilizes relatively simple models that allow for direct assessment of model performance, and can be performed on relatively small datasets.The best performing algorithm, linear discriminant analysis (LDA), reached an accuracy of 84.6% at the species level when naively identifying species, which was further increased to >95% when classifications were limited by known local species pools. Model performance was negatively correlated with taxonomic specificity, with the LDA model reaching an accuracy of ~99% at the subfamily level. When classifying carabid species not included in the training dataset at higher taxonomic levels species, the models performed significantly better than if classifications were made randomly. We also observed greater performance when classifications were made using the hierarchical classification method compared to the single-level classification method at higher taxonomic levels.The general methodology outlined here serves as a proof-of-concept for classifying pitfall trap-collected organisms using machine learning algorithms, and the image data extraction methodology may be used for nonmachine learning uses. We propose that integration of machine learning in large-scale identification pipelines will increase efficiency and lead to a greater flow of insect macroecological data, with the potential to be expanded for use with other noninsect taxa.

摘要

昆虫种群正在迅速变化,监测这些变化对于理解此类转变的原因和后果至关重要。然而,大规模的昆虫识别项目若仅由人工识别,既耗时又昂贵。机器学习提供了一种可能的解决方案,有助于快速有效地收集昆虫数据。在此,我们概述了一种训练分类模型的方法,该模型可根据图像数据识别从陷阱收集的昆虫,然后将此方法应用于识别步甲科昆虫。所有甲虫均由国家生态观测网络(NEON)收集,该网络是一个覆盖美国各地站点的大陆规模生态监测项目。我们描述了图像采集、图像数据提取、数据准备和模型训练的过程,并比较了五种机器学习算法和两种分类方法(层次分类与单级分类)在从物种到亚科水平识别步甲科昆虫方面的性能。所有模型均使用预先提取的特征向量进行训练,而非原始图像数据。我们的方法允许从同一图像中的多个个体提取数据,从而提高时间效率,利用相对简单的模型以便直接评估模型性能,并且可以在相对较小的数据集上进行。表现最佳的算法,即线性判别分析(LDA),在单纯识别物种时,在物种水平达到了84.6%的准确率,当分类受已知当地物种库限制时,准确率进一步提高到>95%。模型性能与分类特异性呈负相关,LDA模型在亚科水平达到了约99%的准确率。当在更高分类水平对未包含在训练数据集中的步甲科物种进行分类时,模型的表现明显优于随机分类。我们还观察到,在更高分类水平上,使用层次分类方法进行分类时的性能优于单级分类方法。这里概述的一般方法是使用机器学习算法对陷阱收集的生物进行分类的概念验证,图像数据提取方法可用于非机器学习用途。我们建议将机器学习集成到大规模识别流程中,这将提高效率并带来更多昆虫宏观生态数据的流动,并且有可能扩展用于其他非昆虫类群。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c2b2/7713910/a27e0919ef4b/ECE3-10-13143-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验