Suppr超能文献

朝向基于服务的工作流程,实现植物标本馆标本自动化信息提取。

Toward a service-based workflow for automated information extraction from herbarium specimens.

机构信息

Botanic Garden and Botanical Museum Berlin, Freie Universität Berlin, Königin-Luise-Str. Berlin, Germany.

Fraunhofer-Institute of Optronics, System Technologies and Image Exploitation, Fraunhofer Str. Karlsruhe, Germany.

出版信息

Database (Oxford). 2018 Jan 1;2018:bay103. doi: 10.1093/database/bay103.

Abstract

Over the past years, herbarium collections worldwide have started to digitize millions of specimens on an industrial scale. Although the imaging costs are steadily falling, capturing the accompanying label information is still predominantly done manually and develops into the principal cost factor. In order to streamline the process of capturing herbarium specimen metadata, we specified a formal extensible workflow integrating a wide range of automated specimen image analysis services. We implemented the workflow on the basis of OpenRefine together with a plugin for handling service calls and responses. The evolving system presently covers the generation of optical character recognition (OCR) from specimen images, the identification of regions of interest in images and the extraction of meaningful information items from OCR. These implementations were developed as part of the Deutsche Forschungsgemeinschaft-funded a standardised and optimised process for data acquisition from digital images of herbarium specimens (StanDAP-Herb) Project.

摘要

在过去的几年里,全球的标本馆收藏开始以工业化的规模将数百万份标本数字化。尽管成像成本在稳步下降,但获取附带的标签信息仍然主要是手动完成的,而且成为了主要的成本因素。为了简化获取标本馆标本元数据的流程,我们指定了一个正式的可扩展工作流程,该流程集成了广泛的自动化标本图像分析服务。我们基于 OpenRefine 实现了该工作流程,并使用一个插件来处理服务调用和响应。该不断发展的系统目前涵盖了从标本图像生成光学字符识别 (OCR)、识别图像中的感兴趣区域以及从 OCR 中提取有意义的信息项。这些实现是作为德国研究联合会资助的一个项目(即标准化和优化从标本馆标本数字图像中获取数据的过程(StanDAP-Herb)项目)的一部分开发的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/37e2/6174549/aa262153fea3/bay103f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验