朝向基于服务的工作流程，实现植物标本馆标本自动化信息提取。

Toward a service-based workflow for automated information extraction from herbarium specimens.

机构信息

Botanic Garden and Botanical Museum Berlin, Freie Universität Berlin, Königin-Luise-Str. Berlin, Germany.

Fraunhofer-Institute of Optronics, System Technologies and Image Exploitation, Fraunhofer Str. Karlsruhe, Germany.

出版信息

Database (Oxford). 2018 Jan 1;2018:bay103. doi: 10.1093/database/bay103.

DOI:10.1093/database/bay103

PMID:30295725

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6174549/

Abstract

Over the past years, herbarium collections worldwide have started to digitize millions of specimens on an industrial scale. Although the imaging costs are steadily falling, capturing the accompanying label information is still predominantly done manually and develops into the principal cost factor. In order to streamline the process of capturing herbarium specimen metadata, we specified a formal extensible workflow integrating a wide range of automated specimen image analysis services. We implemented the workflow on the basis of OpenRefine together with a plugin for handling service calls and responses. The evolving system presently covers the generation of optical character recognition (OCR) from specimen images, the identification of regions of interest in images and the extraction of meaningful information items from OCR. These implementations were developed as part of the Deutsche Forschungsgemeinschaft-funded a standardised and optimised process for data acquisition from digital images of herbarium specimens (StanDAP-Herb) Project.

摘要

在过去的几年里，全球的标本馆收藏开始以工业化的规模将数百万份标本数字化。尽管成像成本在稳步下降，但获取附带的标签信息仍然主要是手动完成的，而且成为了主要的成本因素。为了简化获取标本馆标本元数据的流程，我们指定了一个正式的可扩展工作流程，该流程集成了广泛的自动化标本图像分析服务。我们基于 OpenRefine 实现了该工作流程，并使用一个插件来处理服务调用和响应。该不断发展的系统目前涵盖了从标本图像生成光学字符识别 (OCR)、识别图像中的感兴趣区域以及从 OCR 中提取有意义的信息项。这些实现是作为德国研究联合会资助的一个项目（即标准化和优化从标本馆标本数字图像中获取数据的过程（StanDAP-Herb）项目）的一部分开发的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37e2/6174549/aa262153fea3/bay103f1.jpg

相似文献

Toward a service-based workflow for automated information extraction from herbarium specimens.

Database (Oxford). 2018 Jan 1;2018:bay103. doi: 10.1093/database/bay103.

Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach.

Zookeys. 2012(209):93-102. doi: 10.3897/zookeys.209.3121. Epub 2012 Jul 20.

A novel automated label data extraction and data base generation system from herbarium specimen images using OCR and NER.

Sci Rep. 2024 Jan 2;14(1):112. doi: 10.1038/s41598-023-50179-0.

Increasing the efficiency of digitization workflows for herbarium specimens.

Zookeys. 2012(209):103-13. doi: 10.3897/zookeys.209.3125. Epub 2012 Jul 20.

Designing an Herbarium Digitisation Workflow with Built-In Image Quality Management.

Biodivers Data J. 2020 Mar 26;8:e47051. doi: 10.3897/BDJ.8.e47051. eCollection 2020.

Going deeper in the automated identification of Herbarium specimens.

BMC Evol Biol. 2017 Aug 11;17(1):181. doi: 10.1186/s12862-017-1014-z.

MHA Herbarium: Eastern European collections of vascular plants.

Biodivers Data J. 2020 Oct 23;8:e57512. doi: 10.3897/BDJ.8.e57512. eCollection 2020.

Computer vision applied to herbarium specimens of German trees: testing the future utility of the millions of herbarium specimen images for automated identification.

BMC Evol Biol. 2016 Nov 16;16(1):248. doi: 10.1186/s12862-016-0827-5.

Digitizing specimens in a small herbarium: A viable workflow for collections working with limited resources.

Appl Plant Sci. 2017 Apr 11;5(4). doi: 10.3732/apps.1600125. eCollection 2017 Apr.

Inselect: Automating the Digitization of Natural History Collections.

PLoS One. 2015 Nov 23;10(11):e0143402. doi: 10.1371/journal.pone.0143402. eCollection 2015.

引用本文的文献

Hespi: a pipeline for automatically detecting information from herbarium specimen sheets.

Bioscience. 2025 Jul 17;75(8):637-648. doi: 10.1093/biosci/biaf042. eCollection 2025 Aug.

The digitisation workflow of the herbarium of the State Museum of Natural History of the NAS of Ukraine (LWS).

Biodivers Data J. 2025 Mar 28;13:e148861. doi: 10.3897/BDJ.13.e148861. eCollection 2025.

Identification of herbarium specimen sheet components from high-resolution images using deep learning.

Ecol Evol. 2023 Aug 14;13(8):e10395. doi: 10.1002/ece3.10395. eCollection 2023 Aug.

Comprehensive leaf size traits dataset for seven plant species from digitised herbarium specimen images covering more than two centuries.

Biodivers Data J. 2021 Jul 13;9:e69806. doi: 10.3897/BDJ.9.e69806. eCollection 2021.

本文引用的文献

AnnoSys-implementation of a generic annotation system for schema-based data using the example of biodiversity collection data.

Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax036.

Actionable, long-term stable and semantic web compatible identifiers for access to biological collection objects.

Database (Oxford). 2017 Jan 1;2017(1). doi: 10.1093/database/bax003.

A semi-automated workflow for biodiversity data retrieval, cleaning, and quality control.

Biodivers Data J. 2014 Dec 11(2):e4221. doi: 10.3897/BDJ.2.e4221. eCollection 2014.

Why vouchers matter in botanical research.

Appl Plant Sci. 2013 Oct 29;1(11). doi: 10.3732/apps.1300076. eCollection 2013 Nov.

Symbiota - A virtual platform for creating voucher-based biodiversity information communities.

Biodivers Data J. 2014 Jun 24(2):e1114. doi: 10.3897/BDJ.2.e1114. eCollection 2014.

Herbarium specimens reveal the footprint of climate change on flowering trends across north-central North America.

Ecol Lett. 2013 Aug;16(8):1037-44. doi: 10.1111/ele.12135. Epub 2013 Jun 21.

Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach.

Zookeys. 2012(209):93-102. doi: 10.3897/zookeys.209.3121. Epub 2012 Jul 20.

'From Pilot to production': Large Scale Digitisation project at Naturalis Biodiversity Center.

Zookeys. 2012(209):87-92. doi: 10.3897/zookeys.209.3609. Epub 2012 Jul 20.

Argo: an integrative, interactive, text mining-based workbench supporting curation.

Database (Oxford). 2012 Mar 20;2012:bas010. doi: 10.1093/database/bas010. Print 2012.

Darwin Core: an evolving community-developed biodiversity data standard.

PLoS One. 2012;7(1):e29715. doi: 10.1371/journal.pone.0029715. Epub 2012 Jan 6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

朝向基于服务的工作流程，实现植物标本馆标本自动化信息提取。

Toward a service-based workflow for automated information extraction from herbarium specimens.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献