用于提取与植物物种栖息地和繁殖条件相关关系的无监督文献挖掘方法。

Unsupervised literature mining approaches for extracting relationships pertaining to habitats and reproductive conditions of plant species.

作者信息

Gabud Roselyn, Lapitan Portia, Mariano Vladimir, Mendoza Eduardo, Pampolina Nelson, Clariño Maria Art Antonette, Batista-Navarro Riza

机构信息

Department of Computer Science, College of Engineering, University of the Philippines Diliman, Quezon City, Philippines.

Institute of Computer Science, College of Arts and Sciences, University of the Philippines Los Baños, Laguna, Philippines.

出版信息

Front Artif Intell. 2024 May 23;7:1371411. doi: 10.3389/frai.2024.1371411. eCollection 2024.

DOI:10.3389/frai.2024.1371411

PMID:38845683

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11153722/

Abstract

INTRODUCTION

Fine-grained, descriptive information on habitats and reproductive conditions of plant species are crucial in forest restoration and rehabilitation efforts. Precise timing of fruit collection and knowledge of species' habitat preferences and reproductive status are necessary especially for tropical plant species that have short-lived recalcitrant seeds, and those that exhibit complex reproductive patterns, e.g., species with supra-annual mass flowering events that may occur in irregular intervals. Understanding plant regeneration in the way of planning for effective reforestation can be aided by providing access to structured information, e.g., in knowledge bases, that spans years if not decades as well as covering a wide range of geographic locations. The content of such a resource can be enriched with literature-derived information on species' time-sensitive reproductive conditions and location-specific habitats.

METHODS

We sought to develop unsupervised approaches to extract relationships pertaining to habitats and their locations, and reproductive conditions of plant species and corresponding temporal information. Firstly, we handcrafted rules for a traditional rule-based pattern matching approach. We then developed a relation extraction approach building upon transformer models, i.e., the Text-to-Text Transfer Transformer (T5), casting the relation extraction problem as a question answering and natural language inference task. We then propose a novel unsupervised hybrid approach that combines our rule-based and transformer-based approaches.

RESULTS

Evaluation of our hybrid approach on an annotated corpus of biodiversity-focused documents demonstrated an improvement of up to 15 percentage points in recall and best performance over solely rule-based and transformer-based methods with F1-scores ranging from 89.61 to 96.75% for reproductive condition - temporal expression relations, and ranging from 85.39% to 89.90% for habitat - geographic location relations. Our work shows that even without training models on any domain-specific labeled dataset, we are able to extract relationships between biodiversity concepts from literature with satisfactory performance.

摘要

引言

关于植物物种栖息地和繁殖条件的细粒度、描述性信息对于森林恢复和重建工作至关重要。对于具有短寿命顽拗性种子的热带植物物种以及那些表现出复杂繁殖模式的物种，例如具有可能不定期发生的超年度大规模开花事件的物种，精确的果实采集时间以及对物种栖息地偏好和繁殖状态的了解是必要的。通过提供对结构化信息的访问，例如知识库中的信息，这些信息跨越数年甚至数十年并覆盖广泛的地理位置，可以有助于以规划有效重新造林的方式理解植物再生。这样一个资源的内容可以用从文献中获取的关于物种对时间敏感的繁殖条件和特定地点栖息地的信息来丰富。

方法

我们试图开发无监督方法来提取与植物物种的栖息地及其位置、繁殖条件以及相应时间信息相关的关系。首先，我们为传统的基于规则的模式匹配方法精心制定规则。然后，我们基于变压器模型（即文本到文本转移变压器（T5））开发了一种关系提取方法，将关系提取问题转化为问答和自然语言推理任务。然后，我们提出了一种新颖的无监督混合方法，该方法结合了我们基于规则和基于变压器的方法。

结果

在以生物多样性为重点的文档注释语料库上对我们的混合方法进行评估表明，召回率提高了多达15个百分点，并且在基于繁殖条件 - 时间表达关系的F1分数方面，相对于仅基于规则和基于变压器的方法具有最佳性能，范围从89.61%到96.75%，对于栖息地 - 地理位置关系，F1分数范围从85.39%到89.90%。我们的工作表明，即使不在任何特定领域的标记数据集上训练模型，我们也能够从文献中提取生物多样性概念之间的关系，并且性能令人满意。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/931c/11153722/c85ee04054d7/frai-07-1371411-g0001.jpg

相似文献

Unsupervised literature mining approaches for extracting relationships pertaining to habitats and reproductive conditions of plant species.用于提取与植物物种栖息地和繁殖条件相关关系的无监督文献挖掘方法。

Front Artif Intell. 2024 May 23;7:1371411. doi: 10.3389/frai.2024.1371411. eCollection 2024.

COPIOUS: A gold standard corpus of named entities towards extracting species occurrence from biodiversity literature.COPIOUS：一个用于从生物多样性文献中提取物种出现信息的命名实体黄金标准语料库。

Biodivers Data J. 2019 Jan 22(7):e29626. doi: 10.3897/BDJ.7.e29626. eCollection 2019.

Fine-grained spatial information extraction in radiology as two-turn question answering.放射学中细粒度空间信息提取作为两阶段问答

Int J Med Inform. 2021 Nov 6;158:104628. doi: 10.1016/j.ijmedinf.2021.104628.

A Hybrid Model for Family History Information Identification and Relation Extraction: Development and Evaluation of an End-to-End Information Extraction System.一种用于家族病史信息识别与关系抽取的混合模型：一个端到端信息抽取系统的开发与评估

JMIR Med Inform. 2021 Apr 22;9(4):e22797. doi: 10.2196/22797.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Biomedical Relation Extraction Using Dependency Graph and Decoder-Enhanced Transformer Model.基于依存图和译码器增强型变压器模型的生物医学关系抽取

Bioengineering (Basel). 2023 May 12;10(5):586. doi: 10.3390/bioengineering10050586.

An unsupervised text mining method for relation extraction from biomedical literature.一种用于从生物医学文献中提取关系的无监督文本挖掘方法。

PLoS One. 2014 Jul 18;9(7):e102039. doi: 10.1371/journal.pone.0102039. eCollection 2014.

Overview of the BioCreative VI Precision Medicine Track: mining protein interactions and mutations for precision medicine.BioCreative VI 精准医学赛道概述：精准医学中的蛋白质相互作用和突变挖掘。

Database (Oxford). 2019 Jan 1;2019:bay147. doi: 10.1093/database/bay147.

Application of Deep Learning in Generating Structured Radiology Reports: A Transformer-Based Technique.深度学习在生成结构化放射学报告中的应用：基于转换器的技术。

J Digit Imaging. 2023 Feb;36(1):80-90. doi: 10.1007/s10278-022-00692-x. Epub 2022 Aug 24.

Event extraction of bacteria biotopes: a knowledge-intensive NLP-based approach.细菌生境事件抽取：一种基于知识密集型自然语言处理的方法。

BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S8. doi: 10.1186/1471-2105-13-S11-S8.

本文引用的文献

Relation extraction: advancements through deep learning and entity-related features.关系抽取：通过深度学习和实体相关特征取得的进展。

Soc Netw Anal Min. 2023;13(1):92. doi: 10.1007/s13278-023-01095-8. Epub 2023 Jun 10.

The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification.马修斯相关系数（MCC）应取代受试者工作特征曲线下面积（ROC AUC），作为评估二元分类的标准指标。

BioData Min. 2023 Feb 17;16(1):4. doi: 10.1186/s13040-023-00322-4.

BiodivNERE: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain.生物多样性命名实体识别与关系抽取的黄金标准语料库：BiodivNERE

Biodivers Data J. 2022 Oct 7;10:e89481. doi: 10.3897/BDJ.10.e89481. eCollection 2022.

BertSRC: transformer-based semantic relation classification.BertSRC：基于转换器的语义关系分类。

BMC Med Inform Decis Mak. 2022 Sep 6;22(1):234. doi: 10.1186/s12911-022-01977-5.

Enhancing georeferenced biodiversity inventories: automated information extraction from literature records reveal the gaps.增强地理参考生物多样性清单：从文献记录中自动提取信息揭示了差距。

PeerJ. 2022 Aug 18;10:e13921. doi: 10.7717/peerj.13921. eCollection 2022.

Past and future uses of text mining in ecology and evolution.文本挖掘在生态学和进化中的过去和未来用途。

Proc Biol Sci. 2022 May 25;289(1975):20212721. doi: 10.1098/rspb.2021.2721. Epub 2022 May 18.

People are essential to linking biodiversity data.人在连接生物多样性数据方面至关重要。

Database (Oxford). 2020 Nov 27;2020. doi: 10.1093/database/baaa072.

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.马修斯相关系数（MCC）在二分类评估中优于 F1 得分和准确率的优势。

BMC Genomics. 2020 Jan 2;21(1):6. doi: 10.1186/s12864-019-6413-7.

Climate change has likely already affected global food production.气候变化可能已经影响到了全球粮食生产。

PLoS One. 2019 May 31;14(5):e0217148. doi: 10.1371/journal.pone.0217148. eCollection 2019.

Text mining tools for extracting information about microbial biodiversity in food.用于从食品中提取微生物生物多样性信息的文本挖掘工具。

Food Microbiol. 2019 Aug;81:63-75. doi: 10.1016/j.fm.2018.04.011. Epub 2018 Apr 21.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于提取与植物物种栖息地和繁殖条件相关关系的无监督文献挖掘方法。

Unsupervised literature mining approaches for extracting relationships pertaining to habitats and reproductive conditions of plant species.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

引言

方法

结果

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献