Thompson Karen M, Turnbull Robert, Fitzgerald Emily, Birch Joanne L
University of Melbourne Melbourne Victoria Australia.
Ecol Evol. 2023 Aug 14;13(8):e10395. doi: 10.1002/ece3.10395. eCollection 2023 Aug.
Advanced computer vision techniques hold the potential to mobilise vast quantities of biodiversity data by facilitating the rapid extraction of text- and trait-based data from herbarium specimen digital images, and to increase the efficiency and accuracy of downstream data capture during digitisation. This investigation developed an object detection model using YOLOv5 and digitised collection images from the University of Melbourne Herbarium (MELU). The MELU-trained 'sheet-component' model-trained on 3371 annotated images, validated on 1000 annotated images, run using 'large' model type, at 640 pixels, for 200 epochs-successfully identified most of the 11 component types of the digital specimen images, with an overall model precision measure of 0.983, recall of 0.969 and moving average precision (mAP0.5-0.95) of 0.847. Specifically, 'institutional' and 'annotation' labels were predicted with mAP0.5-0.95 of 0.970 and 0.878 respectively. It was found that annotating at least 2000 images was required to train an adequate model, likely due to the heterogeneity of specimen sheets. The full model was then applied to selected specimens from nine global herbaria (, 7, 2019), quantifying its generalisability: for example, the 'institutional label' was identified with mAP0.5-0.95 of between 0.68 and 0.89 across the various herbaria. Further detailed study demonstrated that starting with the MELU-model weights and retraining for as few as 50 epochs on 30 additional annotated images was sufficient to enable the prediction of a previously unseen component. As many herbaria are resource-constrained, the MELU-trained 'sheet-component' model weights are made available and application encouraged.
先进的计算机视觉技术有潜力通过促进从植物标本馆标本数字图像中快速提取基于文本和特征的数据,来调动大量生物多样性数据,并提高数字化过程中下游数据捕获的效率和准确性。本研究使用YOLOv5开发了一个目标检测模型,并对墨尔本大学植物标本馆(MELU)的馆藏图像进行了数字化处理。在3371张带注释图像上训练、在1000张带注释图像上验证、使用“大型”模型类型、640像素、200个轮次运行的MELU训练的“标本页组件”模型成功识别了数字标本图像的11种组件类型中的大部分,总体模型精度为0.983,召回率为0.969,移动平均精度(mAP0.5 - 0.95)为0.847。具体而言,“机构”和“注释”标签的预测mAP0.5 - 0.95分别为0.970和0.878。研究发现,可能由于标本页的异质性,需要注释至少2000张图像才能训练出一个合适的模型。然后将完整模型应用于来自九个全球植物标本馆的选定标本(,7,2019),量化其通用性:例如,在各个植物标本馆中,“机构标签”的识别mAP0.5 - 0.95在0.68至0.89之间。进一步的详细研究表明,从MELU模型权重开始,在另外30张带注释图像上仅训练50个轮次就足以预测一个以前未见过的组件。由于许多植物标本馆资源有限,现提供MELU训练的“标本页组件 ”模型权重并鼓励应用。