Suppr超能文献

最大限度地提高公民科学家对自动物种识别的贡献。

Maximizing citizen scientists' contribution to automated species recognition.

机构信息

Department of Natural History, Norwegian University of Science and Technology, Trondheim, Norway.

Norwegian Biodiversity Information Centre, Trondheim, Norway.

出版信息

Sci Rep. 2022 May 10;12(1):7648. doi: 10.1038/s41598-022-11257-x.

Abstract

Technological advances and data availability have enabled artificial intelligence-driven tools that can increasingly successfully assist in identifying species from images. Especially within citizen science, an emerging source of information filling the knowledge gaps needed to solve the biodiversity crisis, such tools can allow participants to recognize and report more poorly known species. This can be an important tool in addressing the substantial taxonomic bias in biodiversity data, where broadly recognized, charismatic species are highly over-represented. Meanwhile, the recognition models are trained using the same biased data, so it is important to consider what additional images are needed to improve recognition models. In this study, we investigated how the amount of training data influenced the performance of species recognition models for various taxa. We utilized a large citizen science dataset collected in Norway, where images are added independently from identification. We demonstrate that while adding images of currently under-represented taxa will generally improve recognition models more, there are important deviations from this general pattern. Thus, a more focused prioritization of data collection beyond the basic paradigm that "more is better" is likely to significantly improve species recognition models and advance the representativeness of biodiversity data.

摘要

技术进步和数据可用性使人工智能驱动的工具得以发展,这些工具在识别图像中的物种方面越来越成功。特别是在公民科学中,作为填补解决生物多样性危机所需知识空白的新兴信息来源,这些工具可以让参与者识别和报告更多鲜为人知的物种。这可以成为解决生物多样性数据中大量分类学偏差的重要工具,在这些数据中,广泛认可的、有魅力的物种被高度过度代表。同时,识别模型是使用相同有偏差的数据进行训练的,因此,有必要考虑需要哪些额外的图像来改进识别模型。在这项研究中,我们调查了训练数据的数量如何影响各种分类群的物种识别模型的性能。我们利用了在挪威收集的一个大型公民科学数据集,其中的图像是独立于鉴定添加的。我们证明,虽然添加目前代表性不足的分类群的图像通常会使识别模型得到更多的改进,但这种一般模式存在重要的偏差。因此,超越“越多越好”的基本模式,更有针对性地优先收集数据,很可能会显著改进物种识别模型,并提高生物多样性数据的代表性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a25f/9090737/b6fcafbbb27e/41598_2022_11257_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验