• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种使用 OCR 和 NER 从植物标本图像中自动提取标签数据并生成数据库的新系统。

A novel automated label data extraction and data base generation system from herbarium specimen images using OCR and NER.

机构信息

Institute of Natural Science and Environment, University of Hyogo/The Museum of Nature and Human Activities, Hyogo, 6 Chome, Yayoigaoka, Sanda, Hyogo, 669-1546, Japan.

Institute of Biology, Dahlem Center of Plant Sciences, Freie Universität Berlin, Altensteinstrasse 6, 14195, Berlin, Germany.

出版信息

Sci Rep. 2024 Jan 2;14(1):112. doi: 10.1038/s41598-023-50179-0.

DOI:10.1038/s41598-023-50179-0
PMID:38167449
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10761843/
Abstract

Digital extraction of label data from natural history specimens along with more efficient procedures of data entry and processing is essential for improving documentation and global information availability. Herbaria have made great advances in this direction lately. In this study, using optical character recognition (OCR) and named entity recognition (NER) techniques, we have been able to make further advancements towards fully automatic extraction of label data from herbarium specimen images. This system can be developed and run on a consumer grade desktop computer with standard specifications, and can also be applied to extracting label data from diverse kinds of natural history specimens, such as those in entomological collections. This system can facilitate the digitization and publication of natural history museum specimens around the world.

摘要

从自然历史标本中数字化提取标签数据,以及更高效的数据录入和处理程序,对于改善文档记录和全球信息可用性至关重要。标本馆在这方面最近取得了重大进展。在这项研究中,我们使用光学字符识别(OCR)和命名实体识别(NER)技术,进一步实现了从标本图像中全自动提取标签数据的目标。该系统可以在具有标准规格的消费级台式计算机上进行开发和运行,也可以应用于从各种自然历史标本(如昆虫学收藏标本)中提取标签数据。该系统可以促进世界各地自然历史博物馆标本的数字化和出版。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a305/10761843/70b86b6e7788/41598_2023_50179_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a305/10761843/de5271081256/41598_2023_50179_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a305/10761843/70b86b6e7788/41598_2023_50179_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a305/10761843/de5271081256/41598_2023_50179_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a305/10761843/70b86b6e7788/41598_2023_50179_Fig2_HTML.jpg

相似文献

1
A novel automated label data extraction and data base generation system from herbarium specimen images using OCR and NER.一种使用 OCR 和 NER 从植物标本图像中自动提取标签数据并生成数据库的新系统。
Sci Rep. 2024 Jan 2;14(1):112. doi: 10.1038/s41598-023-50179-0.
2
Toward a service-based workflow for automated information extraction from herbarium specimens.朝向基于服务的工作流程,实现植物标本馆标本自动化信息提取。
Database (Oxford). 2018 Jan 1;2018:bay103. doi: 10.1093/database/bay103.
3
Entomological Collections in the Age of Big Data.大数据时代的昆虫学收藏
Annu Rev Entomol. 2018 Jan 7;63:513-530. doi: 10.1146/annurev-ento-031616-035536. Epub 2017 Oct 20.
4
Inselect: Automating the Digitization of Natural History Collections.Inselect:实现自然历史藏品数字化的自动化
PLoS One. 2015 Nov 23;10(11):e0143402. doi: 10.1371/journal.pone.0143402. eCollection 2015.
5
Widespread mistaken identity in tropical plant collections.热带植物标本采集中普遍存在的身份错误。
Curr Biol. 2015 Nov 16;25(22):R1066-7. doi: 10.1016/j.cub.2015.10.002.
6
Increasing the efficiency of digitization workflows for herbarium specimens.提高植物标本馆标本数字化工作流程的效率。
Zookeys. 2012(209):103-13. doi: 10.3897/zookeys.209.3125. Epub 2012 Jul 20.
7
Museums are biobanks: unlocking the genetic potential of the three billion specimens in the world's biological collections.博物馆是生物银行:挖掘世界生物收藏中 30 亿个标本的遗传潜力。
Curr Opin Insect Sci. 2016 Dec;18:83-88. doi: 10.1016/j.cois.2016.09.009. Epub 2016 Oct 19.
8
Simple but long-lasting: A specimen imaging method applicable for small- and medium-sized herbaria.简单却持久:一种适用于中小型植物标本馆的标本成像方法。
PhytoKeys. 2019 Feb 18(118):1-14. doi: 10.3897/phytokeys.118.29434. eCollection 2019.
9
Computer vision applied to herbarium specimens of German trees: testing the future utility of the millions of herbarium specimen images for automated identification.应用于德国树木标本馆标本的计算机视觉:测试数百万标本图像在自动识别方面的未来效用。
BMC Evol Biol. 2016 Nov 16;16(1):248. doi: 10.1186/s12862-016-0827-5.
10
The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels.光学字符识别(OCR)在植物标本标签数字化中的应用。
PhytoKeys. 2014 May 19(38):15-30. doi: 10.3897/phytokeys.38.7168. eCollection 2014.

引用本文的文献

1
Hespi: a pipeline for automatically detecting information from herbarium specimen sheets.Hespi:一种用于从植物标本薄片自动检测信息的流程。
Bioscience. 2025 Jul 17;75(8):637-648. doi: 10.1093/biosci/biaf042. eCollection 2025 Aug.
2
SpeciMate: Improving metadata extraction from digitised biological specimens.SpeciMate:改进从数字化生物标本中提取元数据的方法。
Biodivers Data J. 2025 Jul 31;13:e160553. doi: 10.3897/BDJ.13.e160553. eCollection 2025.
3
Extracting specimen label data rapidly with a smartphone-a great help for simple digitization in taxonomy and collection management.

本文引用的文献

1
Identification of herbarium specimen sheet components from high-resolution images using deep learning.利用深度学习从高分辨率图像中识别植物标本薄片成分。
Ecol Evol. 2023 Aug 14;13(8):e10395. doi: 10.1002/ece3.10395. eCollection 2023 Aug.
2
Biodiversity Science and the Twenty-First Century Workforce.生物多样性科学与21世纪劳动力
Bioscience. 2020 Feb 1;70(2):119-121. doi: 10.1093/biosci/biz147. Epub 2019 Dec 18.
3
Simple but long-lasting: A specimen imaging method applicable for small- and medium-sized herbaria.简单却持久:一种适用于中小型植物标本馆的标本成像方法。
使用智能手机快速提取标本标签数据——对分类学和标本采集管理中的简单数字化大有帮助。
Zookeys. 2025 Mar 26;1233:15-30. doi: 10.3897/zookeys.1233.140726. eCollection 2025.
4
The digitisation workflow of the herbarium of the State Museum of Natural History of the NAS of Ukraine (LWS).乌克兰国家科学院国家自然历史博物馆植物标本馆(LWS)的数字化工作流程。
Biodivers Data J. 2025 Mar 28;13:e148861. doi: 10.3897/BDJ.13.e148861. eCollection 2025.
5
Streamlining data recording through optical character recognition: a prospective multi-center study in intensive care units.通过光学字符识别简化数据记录:重症监护病房的一项前瞻性多中心研究。
Crit Care. 2025 Mar 18;29(1):117. doi: 10.1186/s13054-025-05347-1.
6
Ensemble automated approaches for producing high-quality herbarium digital records.用于生成高质量植物标本馆数字记录的集成自动化方法。
Appl Plant Sci. 2024 Nov 5;13(1):e11623. doi: 10.1002/aps3.11623. eCollection 2025 Jan-Feb.
7
An annotated catalogue of selected historical type specimens, including genetic data, housed in the Natural History Museum Vienna.维也纳自然历史博物馆收藏的选定历史模式标本注释目录,包括基因数据。
Zookeys. 2024 May 30;1203:253-323. doi: 10.3897/zookeys.1203.117699. eCollection 2024.
PhytoKeys. 2019 Feb 18(118):1-14. doi: 10.3897/phytokeys.118.29434. eCollection 2019.
4
The history and impact of digitization and digital data mobilization on biodiversity research.数字化和数字数据动员对生物多样性研究的历史和影响。
Philos Trans R Soc Lond B Biol Sci. 2018 Nov 19;374(1763):20170391. doi: 10.1098/rstb.2017.0391.
5
Digitizing specimens in a small herbarium: A viable workflow for collections working with limited resources.小型植物标本馆中的标本数字化:资源有限的收藏机构的可行工作流程。
Appl Plant Sci. 2017 Apr 11;5(4). doi: 10.3732/apps.1600125. eCollection 2017 Apr.
6
The French Muséum national d'histoire naturelle vascular plant herbarium collection dataset.法国国家自然历史博物馆维管束植物标本馆馆藏数据集。
Sci Data. 2017 Feb 14;4:170016. doi: 10.1038/sdata.2017.16.
7
The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels.光学字符识别(OCR)在植物标本标签数字化中的应用。
PhytoKeys. 2014 May 19(38):15-30. doi: 10.3897/phytokeys.38.7168. eCollection 2014.
8
No specimen left behind: industrial scale digitization of natural history collections.不遗漏任何标本:自然历史藏品的工业规模数字化
Zookeys. 2012(209):133-46. doi: 10.3897/zookeys.209.3178. Epub 2012 Jul 20.
9
Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach.采用模块化和可扩展的方法,开发用于植物标本数字化的集成工作流程。
Zookeys. 2012(209):93-102. doi: 10.3897/zookeys.209.3121. Epub 2012 Jul 20.
10
The development of a digitising service centre for natural history collections.自然历史藏品数字化服务中心的发展
Zookeys. 2012(209):75-86. doi: 10.3897/zookeys.209.3119. Epub 2012 Jul 20.