Institute of Natural Science and Environment, University of Hyogo/The Museum of Nature and Human Activities, Hyogo, 6 Chome, Yayoigaoka, Sanda, Hyogo, 669-1546, Japan.
Institute of Biology, Dahlem Center of Plant Sciences, Freie Universität Berlin, Altensteinstrasse 6, 14195, Berlin, Germany.
Sci Rep. 2024 Jan 2;14(1):112. doi: 10.1038/s41598-023-50179-0.
Digital extraction of label data from natural history specimens along with more efficient procedures of data entry and processing is essential for improving documentation and global information availability. Herbaria have made great advances in this direction lately. In this study, using optical character recognition (OCR) and named entity recognition (NER) techniques, we have been able to make further advancements towards fully automatic extraction of label data from herbarium specimen images. This system can be developed and run on a consumer grade desktop computer with standard specifications, and can also be applied to extracting label data from diverse kinds of natural history specimens, such as those in entomological collections. This system can facilitate the digitization and publication of natural history museum specimens around the world.
从自然历史标本中数字化提取标签数据,以及更高效的数据录入和处理程序,对于改善文档记录和全球信息可用性至关重要。标本馆在这方面最近取得了重大进展。在这项研究中,我们使用光学字符识别(OCR)和命名实体识别(NER)技术,进一步实现了从标本图像中全自动提取标签数据的目标。该系统可以在具有标准规格的消费级台式计算机上进行开发和运行,也可以应用于从各种自然历史标本(如昆虫学收藏标本)中提取标签数据。该系统可以促进世界各地自然历史博物馆标本的数字化和出版。