Hespi：一种用于从植物标本薄片自动检测信息的流程。

Hespi: a pipeline for automatically detecting information from herbarium specimen sheets.

作者信息

Turnbull Robert, Fitzgerald Emily, Thompson Karen M, Birch Joanne L

机构信息

Melbourne Data Analytics Platform.

School of BioSciences at the University of Melbourne, Melbourne, Victoria, Australia.

出版信息

Bioscience. 2025 Jul 17;75(8):637-648. doi: 10.1093/biosci/biaf042. eCollection 2025 Aug.

DOI:10.1093/biosci/biaf042

PMID:40821888

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12352312/

Abstract

Specimen-associated biodiversity data are crucial for biological, environmental, and conservation sciences. A rate shift is needed to extract data from specimen images efficiently, moving beyond human-mediated transcription. We developed Hespi (for ) using advanced computer vision techniques to extract authoritative data applicable for a range of research purposes from primary specimen labels on herbarium specimens. Hespi integrates two object detection models: one for detecting the components of the sheet and another for fields on the primary specimen label. It classifies labels as printed, typed, handwritten, or mixed and uses optical character recognition and handwritten text recognition for extraction. The text is then corrected against authoritative taxon databases and refined using a multimodal large language model. Hespi accurately detects and extracts text from specimen sheets across international herbaria, and its modular design allows users to train and integrate custom models.

摘要

与标本相关的生物多样性数据对生物学、环境科学和保护科学至关重要。需要一种速率转变来有效地从标本图像中提取数据，超越人工转录。我们开发了Hespi（用于），利用先进的计算机视觉技术从植物标本馆标本的原始标本标签中提取适用于一系列研究目的的权威数据。Hespi集成了两个目标检测模型：一个用于检测标本页的组成部分，另一个用于检测原始标本标签上的字段。它将标签分类为印刷、打字、手写或混合类型，并使用光学字符识别和手写文本识别进行提取。然后，根据权威分类群数据库对文本进行校正，并使用多模态大语言模型进行优化。Hespi能够准确地从国际植物标本馆的标本页中检测和提取文本，其模块化设计允许用户训练和集成自定义模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c660/12352312/50867c4bda09/biaf042fig1.jpg

相似文献

Hespi: a pipeline for automatically detecting information from herbarium specimen sheets.

Bioscience. 2025 Jul 17;75(8):637-648. doi: 10.1093/biosci/biaf042. eCollection 2025 Aug.

Prescription of Controlled Substances: Benefits and Risks

The effectiveness and acceptability of multimedia information when recruiting children and young people to trials: the TRECA meta-analysis of SWATs.

Health Soc Care Deliv Res. 2023 Nov;11(24):1-112. doi: 10.3310/HTPM3841.

Identifying Adverse Drug Events in Clinical Text Using Fine-Tuned Clinical Language Models: Machine Learning Study.

JMIR Form Res. 2025 Sep 11;9:e71949. doi: 10.2196/71949.

Automated Extraction of Mortality Information From Publicly Available Sources Using Large Language Models: Development and Evaluation Study.

J Med Internet Res. 2025 Aug 18;27:e71113. doi: 10.2196/71113.

Integrating datasets from herbarium specimens and images to treat a Neotropical myrtle species complex.

Ann Bot. 2025 Jul 14;135(6):1075-1092. doi: 10.1093/aob/mcae183.

Large Language Models can extract morphological data from taxonomic descriptions, but their stochastic nature makes automation challenging: a test on Australian Asteraceae.

PhytoKeys. 2025 Aug 19;261:189-210. doi: 10.3897/phytokeys.261.158396. eCollection 2025.

Short-Term Memory Impairment

The continuing need for taxonomic input in phytochemical research.

J Ethnopharmacol. 2025 Aug 26;354:120474. doi: 10.1016/j.jep.2025.120474.

Influence of early through late fusion on pancreas segmentation from imperfectly registered multimodal magnetic resonance imaging.

J Med Imaging (Bellingham). 2025 Mar;12(2):024008. doi: 10.1117/1.JMI.12.2.024008. Epub 2025 Apr 26.

本文引用的文献

Ensemble automated approaches for producing high-quality herbarium digital records.

Appl Plant Sci. 2024 Nov 5;13(1):e11623. doi: 10.1002/aps3.11623. eCollection 2025 Jan-Feb.

Humans in the loop: Community science and machine learning synergies for overcoming herbarium digitization bottlenecks.

Appl Plant Sci. 2024 Jan 3;12(1):e11560. doi: 10.1002/aps3.11560. eCollection 2024 Jan-Feb.

A novel automated label data extraction and data base generation system from herbarium specimen images using OCR and NER.

Sci Rep. 2024 Jan 2;14(1):112. doi: 10.1038/s41598-023-50179-0.

Herbarium specimen label transcription reimagined with large language models: Capabilities, productivity, and risks.

Am J Bot. 2023 Dec;110(12):e16256. doi: 10.1002/ajb2.16256. Epub 2023 Dec 2.

Identification of herbarium specimen sheet components from high-resolution images using deep learning.

Ecol Evol. 2023 Aug 14;13(8):e10395. doi: 10.1002/ece3.10395. eCollection 2023 Aug.

The herbarium of the future.

Trends Ecol Evol. 2023 May;38(5):412-423. doi: 10.1016/j.tree.2022.11.015. Epub 2022 Dec 20.

Development of a system for the automated identification of herbarium specimens with high accuracy.

Sci Rep. 2022 May 16;12(1):8066. doi: 10.1038/s41598-022-11450-y.

Harnessing Large-Scale Herbarium Image Datasets Through Representation Learning.

Front Plant Sci. 2022 Jan 13;12:806407. doi: 10.3389/fpls.2021.806407. eCollection 2021.

Recognition of Latin scientific names using artificial neural networks.

Appl Plant Sci. 2020 Jul 31;8(7):e11378. doi: 10.1002/aps3.11378. eCollection 2020 Jul.

Machine Learning Using Digitized Herbarium Specimens to Advance Phenological Research.

Bioscience. 2020 Jul 1;70(6):610-620. doi: 10.1093/biosci/biaa044. Epub 2020 May 13.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Hespi：一种用于从植物标本薄片自动检测信息的流程。

Hespi: a pipeline for automatically detecting information from herbarium specimen sheets.

作者信息

Turnbull Robert, Fitzgerald Emily, Thompson Karen M, Birch Joanne L

机构信息

Melbourne Data Analytics Platform.

School of BioSciences at the University of Melbourne, Melbourne, Victoria, Australia.

出版信息

Bioscience. 2025 Jul 17;75(8):637-648. doi: 10.1093/biosci/biaf042. eCollection 2025 Aug.

DOI:10.1093/biosci/biaf042

PMID:40821888

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12352312/

Abstract

摘要

Hespi：一种用于从植物标本薄片自动检测信息的流程。

Hespi: a pipeline for automatically detecting information from herbarium specimen sheets.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Hespi：一种用于从植物标本薄片自动检测信息的流程。

Hespi: a pipeline for automatically detecting information from herbarium specimen sheets.

作者信息

机构信息

出版信息

相似文献

本文引用的文献