Suppr超能文献

SpeciMate:改进从数字化生物标本中提取元数据的方法。

SpeciMate: Improving metadata extraction from digitised biological specimens.

作者信息

Stenhouse Alan, Thrall Peter H

机构信息

CSIRO, Canberra, Australia CSIRO Canberra Australia.

出版信息

Biodivers Data J. 2025 Jul 31;13:e160553. doi: 10.3897/BDJ.13.e160553. eCollection 2025.

Abstract

BACKGROUND

The digitisation of natural history collections represents a critical step towards preserving and increasing accessibility to valuable scientific data. Despite their fundamental importance to taxonomy, ecology and conservation, the world's natural history collections remain underutilised due to the labour-intensive process of extracting metadata from specimen labels.

NEW INFORMATION

This paper describes SpeciMate, a software application that uses a human-AI collaborative approach to accelerate the extraction of metadata from digitised specimen images. The system leverages artificial intelligence web services including optical character recognition (OCR), automated translation and large language and multimodal models (LLMs) to extract structured metadata, while requiring human expertise for prompt engineering and data curation. We describe the application's architecture, functionality and workflows, which enable effective processing of various specimen types including herbarium sheets and insect slides. Our trials indicate that this tool significantly improves the efficiency of metadata extraction while maintaining high data quality. The combination of automated AI processing with human supervision and refinement represents a promising approach to accelerating the digitisation and databasing of natural history collections, thereby enabling broader access to these invaluable resources for research, education and conservation efforts.

摘要

背景

自然历史藏品的数字化是朝着保存和增加获取有价值科学数据的机会迈出的关键一步。尽管自然历史藏品对分类学、生态学和保护至关重要,但由于从标本标签中提取元数据的过程劳动强度大,世界上的自然历史藏品仍未得到充分利用。

新信息

本文介绍了SpeciMate,这是一款软件应用程序,它采用人机协作方法来加速从数字化标本图像中提取元数据。该系统利用包括光学字符识别(OCR)、自动翻译以及大语言和多模态模型(LLM)在内的人工智能网络服务来提取结构化元数据,同时需要人类专业知识进行提示工程和数据整理。我们描述了该应用程序的架构、功能和工作流程,这些能够有效处理包括植物标本和昆虫玻片在内的各种标本类型。我们的试验表明,该工具在保持高数据质量的同时显著提高了元数据提取效率。自动化人工智能处理与人工监督和完善相结合,是加速自然历史藏品数字化和数据库建设的一种有前景的方法,从而能够让更多人获取这些用于研究、教育和保护工作的宝贵资源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e76a/12332497/2cfc30fadf99/bdj-13-e160553-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验