Suppr超能文献

用于提取与植物物种栖息地和繁殖条件相关关系的无监督文献挖掘方法。

Unsupervised literature mining approaches for extracting relationships pertaining to habitats and reproductive conditions of plant species.

作者信息

Gabud Roselyn, Lapitan Portia, Mariano Vladimir, Mendoza Eduardo, Pampolina Nelson, Clariño Maria Art Antonette, Batista-Navarro Riza

机构信息

Department of Computer Science, College of Engineering, University of the Philippines Diliman, Quezon City, Philippines.

Institute of Computer Science, College of Arts and Sciences, University of the Philippines Los Baños, Laguna, Philippines.

出版信息

Front Artif Intell. 2024 May 23;7:1371411. doi: 10.3389/frai.2024.1371411. eCollection 2024.

Abstract

INTRODUCTION

Fine-grained, descriptive information on habitats and reproductive conditions of plant species are crucial in forest restoration and rehabilitation efforts. Precise timing of fruit collection and knowledge of species' habitat preferences and reproductive status are necessary especially for tropical plant species that have short-lived recalcitrant seeds, and those that exhibit complex reproductive patterns, e.g., species with supra-annual mass flowering events that may occur in irregular intervals. Understanding plant regeneration in the way of planning for effective reforestation can be aided by providing access to structured information, e.g., in knowledge bases, that spans years if not decades as well as covering a wide range of geographic locations. The content of such a resource can be enriched with literature-derived information on species' time-sensitive reproductive conditions and location-specific habitats.

METHODS

We sought to develop unsupervised approaches to extract relationships pertaining to habitats and their locations, and reproductive conditions of plant species and corresponding temporal information. Firstly, we handcrafted rules for a traditional rule-based pattern matching approach. We then developed a relation extraction approach building upon transformer models, i.e., the Text-to-Text Transfer Transformer (T5), casting the relation extraction problem as a question answering and natural language inference task. We then propose a novel unsupervised hybrid approach that combines our rule-based and transformer-based approaches.

RESULTS

Evaluation of our hybrid approach on an annotated corpus of biodiversity-focused documents demonstrated an improvement of up to 15 percentage points in recall and best performance over solely rule-based and transformer-based methods with F1-scores ranging from 89.61 to 96.75% for reproductive condition - temporal expression relations, and ranging from 85.39% to 89.90% for habitat - geographic location relations. Our work shows that even without training models on any domain-specific labeled dataset, we are able to extract relationships between biodiversity concepts from literature with satisfactory performance.

摘要

引言

关于植物物种栖息地和繁殖条件的细粒度、描述性信息对于森林恢复和重建工作至关重要。对于具有短寿命顽拗性种子的热带植物物种以及那些表现出复杂繁殖模式的物种,例如具有可能不定期发生的超年度大规模开花事件的物种,精确的果实采集时间以及对物种栖息地偏好和繁殖状态的了解是必要的。通过提供对结构化信息的访问,例如知识库中的信息,这些信息跨越数年甚至数十年并覆盖广泛的地理位置,可以有助于以规划有效重新造林的方式理解植物再生。这样一个资源的内容可以用从文献中获取的关于物种对时间敏感的繁殖条件和特定地点栖息地的信息来丰富。

方法

我们试图开发无监督方法来提取与植物物种的栖息地及其位置、繁殖条件以及相应时间信息相关的关系。首先,我们为传统的基于规则的模式匹配方法精心制定规则。然后,我们基于变压器模型(即文本到文本转移变压器(T5))开发了一种关系提取方法,将关系提取问题转化为问答和自然语言推理任务。然后,我们提出了一种新颖的无监督混合方法,该方法结合了我们基于规则和基于变压器的方法。

结果

在以生物多样性为重点的文档注释语料库上对我们的混合方法进行评估表明,召回率提高了多达15个百分点,并且在基于繁殖条件 - 时间表达关系的F1分数方面,相对于仅基于规则和基于变压器的方法具有最佳性能,范围从89.61%到96.75%,对于栖息地 - 地理位置关系,F1分数范围从85.39%到89.90%。我们的工作表明,即使不在任何特定领域的标记数据集上训练模型,我们也能够从文献中提取生物多样性概念之间的关系,并且性能令人满意。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/931c/11153722/c85ee04054d7/frai-07-1371411-g0001.jpg

相似文献

本文引用的文献

4
BertSRC: transformer-based semantic relation classification.BertSRC:基于转换器的语义关系分类。
BMC Med Inform Decis Mak. 2022 Sep 6;22(1):234. doi: 10.1186/s12911-022-01977-5.
6
Past and future uses of text mining in ecology and evolution.文本挖掘在生态学和进化中的过去和未来用途。
Proc Biol Sci. 2022 May 25;289(1975):20212721. doi: 10.1098/rspb.2021.2721. Epub 2022 May 18.
9
Climate change has likely already affected global food production.气候变化可能已经影响到了全球粮食生产。
PLoS One. 2019 May 31;14(5):e0217148. doi: 10.1371/journal.pone.0217148. eCollection 2019.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验