利用深度学习从高分辨率图像中识别植物标本薄片成分。

Identification of herbarium specimen sheet components from high-resolution images using deep learning.

作者信息

Thompson Karen M, Turnbull Robert, Fitzgerald Emily, Birch Joanne L

机构信息

University of Melbourne Melbourne Victoria Australia.

出版信息

Ecol Evol. 2023 Aug 14;13(8):e10395. doi: 10.1002/ece3.10395. eCollection 2023 Aug.

DOI:10.1002/ece3.10395

PMID:37589042

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10425611/

Abstract

Advanced computer vision techniques hold the potential to mobilise vast quantities of biodiversity data by facilitating the rapid extraction of text- and trait-based data from herbarium specimen digital images, and to increase the efficiency and accuracy of downstream data capture during digitisation. This investigation developed an object detection model using YOLOv5 and digitised collection images from the University of Melbourne Herbarium (MELU). The MELU-trained 'sheet-component' model-trained on 3371 annotated images, validated on 1000 annotated images, run using 'large' model type, at 640 pixels, for 200 epochs-successfully identified most of the 11 component types of the digital specimen images, with an overall model precision measure of 0.983, recall of 0.969 and moving average precision (mAP0.5-0.95) of 0.847. Specifically, 'institutional' and 'annotation' labels were predicted with mAP0.5-0.95 of 0.970 and 0.878 respectively. It was found that annotating at least 2000 images was required to train an adequate model, likely due to the heterogeneity of specimen sheets. The full model was then applied to selected specimens from nine global herbaria (, 7, 2019), quantifying its generalisability: for example, the 'institutional label' was identified with mAP0.5-0.95 of between 0.68 and 0.89 across the various herbaria. Further detailed study demonstrated that starting with the MELU-model weights and retraining for as few as 50 epochs on 30 additional annotated images was sufficient to enable the prediction of a previously unseen component. As many herbaria are resource-constrained, the MELU-trained 'sheet-component' model weights are made available and application encouraged.

摘要

先进的计算机视觉技术有潜力通过促进从植物标本馆标本数字图像中快速提取基于文本和特征的数据，来调动大量生物多样性数据，并提高数字化过程中下游数据捕获的效率和准确性。本研究使用YOLOv5开发了一个目标检测模型，并对墨尔本大学植物标本馆（MELU）的馆藏图像进行了数字化处理。在3371张带注释图像上训练、在1000张带注释图像上验证、使用“大型”模型类型、640像素、200个轮次运行的MELU训练的“标本页组件”模型成功识别了数字标本图像的11种组件类型中的大部分，总体模型精度为0.983，召回率为0.969，移动平均精度（mAP0.5 - 0.95）为0.847。具体而言，“机构”和“注释”标签的预测mAP0.5 - 0.95分别为0.970和0.878。研究发现，可能由于标本页的异质性，需要注释至少2000张图像才能训练出一个合适的模型。然后将完整模型应用于来自九个全球植物标本馆的选定标本（，7，2019），量化其通用性：例如，在各个植物标本馆中，“机构标签”的识别mAP0.5 - 0.95在0.68至0.89之间。进一步的详细研究表明，从MELU模型权重开始，在另外30张带注释图像上仅训练50个轮次就足以预测一个以前未见过的组件。由于许多植物标本馆资源有限，现提供MELU训练的“标本页组件 ”模型权重并鼓励应用。

相似文献

Identification of herbarium specimen sheet components from high-resolution images using deep learning.利用深度学习从高分辨率图像中识别植物标本薄片成分。

Ecol Evol. 2023 Aug 14;13(8):e10395. doi: 10.1002/ece3.10395. eCollection 2023 Aug.

Detection and annotation of plant organs from digitised herbarium scans using deep learning.利用深度学习从数字化植物标本扫描图像中检测和标注植物器官

Biodivers Data J. 2020 Dec 10;8:e57090. doi: 10.3897/BDJ.8.e57090. eCollection 2020.

Automated Extraction of Phenotypic Leaf Traits of Individual Intact Herbarium Leaves from Herbarium Specimen Images Using Deep Learning Based Semantic Segmentation.基于深度学习的语义分割的馆藏标本图像中单株完整植物叶片表型性状的自动提取。

Sensors (Basel). 2021 Jul 2;21(13):4549. doi: 10.3390/s21134549.

Comprehensive leaf size traits dataset for seven plant species from digitised herbarium specimen images covering more than two centuries.来自数字化植物标本图像的涵盖两个多世纪的七种植物的综合叶片大小性状数据集。

Biodivers Data J. 2021 Jul 13;9:e69806. doi: 10.3897/BDJ.9.e69806. eCollection 2021.

MHA Herbarium: Eastern European collections of vascular plants.MHA植物标本馆：东欧维管植物收藏

Biodivers Data J. 2020 Oct 23;8:e57512. doi: 10.3897/BDJ.8.e57512. eCollection 2020.

Computer vision applied to herbarium specimens of German trees: testing the future utility of the millions of herbarium specimen images for automated identification.应用于德国树木标本馆标本的计算机视觉：测试数百万标本图像在自动识别方面的未来效用。

BMC Evol Biol. 2016 Nov 16;16(1):248. doi: 10.1186/s12862-016-0827-5.

Going deeper in the automated identification of Herbarium specimens.深入探讨植物标本馆标本的自动识别

BMC Evol Biol. 2017 Aug 11;17(1):181. doi: 10.1186/s12862-017-1014-z.

Harnessing Large-Scale Herbarium Image Datasets Through Representation Learning.通过表征学习利用大规模植物标本图像数据集

Front Plant Sci. 2022 Jan 13;12:806407. doi: 10.3389/fpls.2021.806407. eCollection 2021.

LeafMachine: Using machine learning to automate leaf trait extraction from digitized herbarium specimens.LeafMachine：利用机器学习从数字化植物标本中自动提取叶片特征。

Appl Plant Sci. 2020 Jul 1;8(6):e11367. doi: 10.1002/aps3.11367. eCollection 2020 Jun.

Ten lessons learned from the mass digitisation of a herbarium collection.从植物标本馆馆藏大规模数字化中汲取的十条经验教训。

PhytoKeys. 2024 Jul 2;244:23-37. doi: 10.3897/phytokeys.244.120112. eCollection 2024.

引用本文的文献

Hespi: a pipeline for automatically detecting information from herbarium specimen sheets.Hespi：一种用于从植物标本薄片自动检测信息的流程。

Bioscience. 2025 Jul 17;75(8):637-648. doi: 10.1093/biosci/biaf042. eCollection 2025 Aug.

SpeciMate: Improving metadata extraction from digitised biological specimens.SpeciMate：改进从数字化生物标本中提取元数据的方法。

Biodivers Data J. 2025 Jul 31;13:e160553. doi: 10.3897/BDJ.13.e160553. eCollection 2025.

Enhancing plant morphological trait identification in herbarium collections through deep learning-based segmentation.通过基于深度学习的分割技术增强植物标本馆馆藏中植物形态特征的识别。

Appl Plant Sci. 2025 Feb 13;13(2):e70000. doi: 10.1002/aps3.70000. eCollection 2025 Mar-Apr.

Herbarium collections remain essential in the age of community science.在社区科学时代，植物标本馆收藏仍然至关重要。

Nat Commun. 2024 Aug 31;15(1):7586. doi: 10.1038/s41467-024-51899-1.

Enhancing computer image recognition with improved image algorithms.通过改进图像算法增强计算机图像识别能力。

Sci Rep. 2024 Jun 14;14(1):13709. doi: 10.1038/s41598-024-64193-3.

A novel automated label data extraction and data base generation system from herbarium specimen images using OCR and NER.一种使用 OCR 和 NER 从植物标本图像中自动提取标签数据并生成数据库的新系统。

Sci Rep. 2024 Jan 2;14(1):112. doi: 10.1038/s41598-023-50179-0.

本文引用的文献

Detection and annotation of plant organs from digitised herbarium scans using deep learning.利用深度学习从数字化植物标本扫描图像中检测和标注植物器官

Biodivers Data J. 2020 Dec 10;8:e57090. doi: 10.3897/BDJ.8.e57090. eCollection 2020.

Reversing extinction trends: new uses of (old) herbarium specimens to accelerate conservation action on threatened species.扭转灭绝趋势：（旧的）植物标本馆标本的新用途以加速对濒危物种的保护行动

New Phytol. 2021 Apr;230(2):433-450. doi: 10.1111/nph.17133. Epub 2021 Jan 13.

Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning.生成植物标本的分割掩码以及用于使用深度学习训练分割模型的数据集。

Appl Plant Sci. 2020 Jul 1;8(6):e11352. doi: 10.1002/aps3.11352. eCollection 2020 Jun.

GinJinn: An object-detection pipeline for automated feature extraction from herbarium specimens.GinJinn：一种用于从植物标本中自动提取特征的目标检测管道。

Appl Plant Sci. 2020 Jun 26;8(6):e11351. doi: 10.1002/aps3.11351. eCollection 2020 Jun.

Designing an Herbarium Digitisation Workflow with Built-In Image Quality Management.设计一个具有内置图像质量管理功能的植物标本数字化工作流程。

Biodivers Data J. 2020 Mar 26;8:e47051. doi: 10.3897/BDJ.8.e47051. eCollection 2020.

A benchmark dataset of herbarium specimen images with label data.一个带有标注数据的植物标本图像基准数据集。

Biodivers Data J. 2019 Feb 8(7):e31817. doi: 10.3897/BDJ.7.e31817. eCollection 2019.

Simple but long-lasting: A specimen imaging method applicable for small- and medium-sized herbaria.简单却持久：一种适用于中小型植物标本馆的标本成像方法。

PhytoKeys. 2019 Feb 18(118):1-14. doi: 10.3897/phytokeys.118.29434. eCollection 2019.

Toward a service-based workflow for automated information extraction from herbarium specimens.朝向基于服务的工作流程，实现植物标本馆标本自动化信息提取。

Database (Oxford). 2018 Jan 1;2018:bay103. doi: 10.1093/database/bay103.

Plant Species Identification Using Computer Vision Techniques: A Systematic Literature Review.利用计算机视觉技术进行植物物种识别：一项系统文献综述。

Arch Comput Methods Eng. 2018;25(2):507-543. doi: 10.1007/s11831-016-9206-z. Epub 2017 Jan 7.

Digitization of herbaria enables novel research.植物标本馆的数字化使新的研究成为可能。

Am J Bot. 2017 Sep;104(9):1281-1284. doi: 10.3732/ajb.1700281.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用深度学习从高分辨率图像中识别植物标本薄片成分。

Identification of herbarium specimen sheet components from high-resolution images using deep learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献