从网络中全自动提取形态特征：乌托邦还是现实？

Fully automatic extraction of morphological traits from the web: Utopia or reality?

作者信息

Marcos Diego, van de Vlasakker Robert, Athanasiadis Ioannis N, Bonnet Pierre, Goëau Hervé, Joly Alexis, Kissling W Daniel, Leblanc César, van Proosdij André S J, Panousis Konstantinos P

机构信息

INRIA, TETIS, University of Montpellier Montpellier France.

University of Montpellier Montpellier France.

出版信息

Appl Plant Sci. 2025 Jun 1;13(3):e70005. doi: 10.1002/aps3.70005. eCollection 2025 May-Jun.

DOI:10.1002/aps3.70005

PMID:40575552

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12188617/

Abstract

PREMISE

Plant morphological traits, their observable characteristics, are fundamental to understanding the role played by each species within its ecosystem; however, compiling trait information for even a moderate number of species is a demanding task that may take experts years to accomplish. At the same time, online species descriptions contain massive amounts of information about morphological traits, but the lack of structure makes this source of data impossible to use at scale.

METHODS

To overcome this, we propose to leverage recent advances in large language models and devise a mechanism for gathering and processing plant trait information in the form of unstructured textual descriptions, without manual curation.

RESULTS

We evaluate our approach by automatically replicating three manually created species-trait matrices. Our method found values for over half of all species-trait pairs, with an F1 score of over 75%.

DISCUSSION

Our results suggest that large-scale creation of structured trait databases from unstructured online text is now feasible due to the information extraction capabilities of large language models. However, the process is currently limited by the availability of textual descriptions that cover all traits of interest.

摘要

前提

植物形态特征，即其可观察到的特性，对于理解每个物种在其生态系统中所起的作用至关重要；然而，即使是为数量适中的物种汇编特征信息也是一项艰巨的任务，可能需要专家数年时间才能完成。与此同时，在线物种描述包含了大量有关形态特征的信息，但缺乏结构化使其无法大规模使用。

方法

为克服这一问题，我们建议利用大语言模型的最新进展，并设计一种机制，无需人工整理即可收集和处理非结构化文本描述形式的植物特征信息。

结果

我们通过自动复制三个人工创建的物种 - 特征矩阵来评估我们的方法。我们的方法找到了所有物种 - 特征对中超过一半的值，F1分数超过75%。

讨论

我们的结果表明，由于大语言模型的信息提取能力，现在从非结构化在线文本大规模创建结构化特征数据库是可行的。然而，目前该过程受到涵盖所有感兴趣特征的文本描述可用性的限制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/71ad/12188617/79c11b7404f9/APS3-13-e70005-g005.jpg

相似文献

Fully automatic extraction of morphological traits from the web: Utopia or reality?从网络中全自动提取形态特征：乌托邦还是现实？

Appl Plant Sci. 2025 Jun 1;13(3):e70005. doi: 10.1002/aps3.70005. eCollection 2025 May-Jun.

Factors that influence parents' and informal caregivers' views and practices regarding routine childhood vaccination: a qualitative evidence synthesis.影响父母和非正式照顾者对常规儿童疫苗接种看法和做法的因素：定性证据综合分析。

Cochrane Database Syst Rev. 2021 Oct 27;10(10):CD013265. doi: 10.1002/14651858.CD013265.pub2.

Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施：系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。

Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

Withdrawal or continuation of cholinesterase inhibitors or memantine or both, in people with dementia.在痴呆症患者中，停用或继续使用胆碱酯酶抑制剂、美金刚或两者。

Cochrane Database Syst Rev. 2021 Feb 3;2(2):CD009081. doi: 10.1002/14651858.CD009081.pub2.

Home treatment for mental health problems: a systematic review.心理健康问题的居家治疗：一项系统综述

Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗：一项网状Meta分析。

Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病：网络荟萃分析。

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

A systematic review of speech, language and communication interventions for children with Down syndrome from 0 to 6 years.对0至6岁唐氏综合征儿童言语、语言和沟通干预措施的系统评价。

Int J Lang Commun Disord. 2022 Mar;57(2):441-463. doi: 10.1111/1460-6984.12699. Epub 2022 Feb 22.

The educational effects of portfolios on undergraduate student learning: a Best Evidence Medical Education (BEME) systematic review. BEME Guide No. 11.档案袋对本科学生学习的教育效果：最佳证据医学教育（BEME）系统评价。BEME指南第11号。

Med Teach. 2009 Apr;31(4):282-98. doi: 10.1080/01421590902889897.

Stakeholders' perceptions and experiences of factors influencing the commissioning, delivery, and uptake of general health checks: a qualitative evidence synthesis.利益相关者对影响一般健康检查的委托、提供和接受因素的看法与体验：一项定性证据综合分析

Cochrane Database Syst Rev. 2025 Mar 20;3(3):CD014796. doi: 10.1002/14651858.CD014796.pub2.

引用本文的文献

Large Language Models can extract morphological data from taxonomic descriptions, but their stochastic nature makes automation challenging: a test on Australian Asteraceae.大型语言模型可以从分类描述中提取形态学数据，但其随机性使得自动化具有挑战性：对澳大利亚菊科植物的一项测试。

PhytoKeys. 2025 Aug 19;261:189-210. doi: 10.3897/phytokeys.261.158396. eCollection 2025.

From literature to biodiversity data: mining arthropod organismal traits with machine learning.从文献到生物多样性数据：利用机器学习挖掘节肢动物的机体特征

Biodivers Data J. 2025 Aug 5;13:e153070. doi: 10.3897/BDJ.13.e153070. eCollection 2025.

本文引用的文献

FloraTraiter: Automated parsing of traits from descriptive biodiversity literature.植物特征提取器：从描述性生物多样性文献中自动解析特征

Appl Plant Sci. 2024 Jan 18;12(1):e11563. doi: 10.1002/aps3.11563. eCollection 2024 Jan-Feb.

AusTraits, a curated plant trait database for the Australian flora.AusTraits，一个经过精心整理的澳大利亚植物区系植物性状数据库。

Sci Data. 2021 Sep 30;8(1):254. doi: 10.1038/s41597-021-01006-6.

Not that kind of tree: Assessing the potential for decision tree-based plant identification using trait databases.不是那种树：利用性状数据库评估基于决策树的植物识别潜力。

Appl Plant Sci. 2020 Jul 31;8(7):e11379. doi: 10.1002/aps3.11379. eCollection 2020 Jul.

Open Science principles for accelerating trait-based science across the Tree of Life.促进基于特征的科学在生命之树中发展的开放科学原则。

Nat Ecol Evol. 2020 Mar;4(3):294-303. doi: 10.1038/s41559-020-1109-6. Epub 2020 Feb 17.

TRY plant trait database - enhanced coverage and open access.TRY 植物性状数据库——增强的涵盖范围和开放获取。

Glob Chang Biol. 2020 Jan;26(1):119-188. doi: 10.1111/gcb.14904. Epub 2019 Dec 31.

PalmTraits 1.0, a species-level functional trait database of palms worldwide.棕榈特征数据库 1.0，一个全球棕榈物种水平功能特征数据库。

Sci Data. 2019 Sep 24;6(1):178. doi: 10.1038/s41597-019-0189-0.

Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing.利用自然语言处理技术从生命之树的分类描述中提取表型特征。

Appl Plant Sci. 2018 Mar 31;6(3):e1035. doi: 10.1002/aps3.1035. eCollection 2018 Mar.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

从网络中全自动提取形态特征：乌托邦还是现实？

Fully automatic extraction of morphological traits from the web: Utopia or reality?

作者信息

机构信息

出版信息

PREMISE

METHODS

RESULTS

DISCUSSION

前提

方法

结果

讨论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献