一个用于生物医学领域植物与化学物质关系的语料库。

A corpus for plant-chemical relationships in the biomedical domain.

作者信息

Choi Wonjun, Kim Baeksoo, Cho Hyejin, Lee Doheon, Lee Hyunju

机构信息

School of Information and Communications, Gwangju Institute of Science and Technology, Chemdangwagi-ro, Gwangju, Republic of Korea.

Department of Bio and Brain Engineering, KAIST, Yuseong-gu, Daejeon, Republic of Korea.

出版信息

BMC Bioinformatics. 2016 Sep 20;17:386. doi: 10.1186/s12859-016-1249-5.

DOI:10.1186/s12859-016-1249-5

PMID:27650402

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5029005/

Abstract

BACKGROUND

Plants are natural products that humans consume in various ways including food and medicine. They have a long empirical history of treating diseases with relatively few side effects. Based on these strengths, many studies have been performed to verify the effectiveness of plants in treating diseases. It is crucial to understand the chemicals contained in plants because these chemicals can regulate activities of proteins that are key factors in causing diseases. With the accumulation of a large volume of biomedical literature in various databases such as PubMed, it is possible to automatically extract relationships between plants and chemicals in a large-scale way if we apply a text mining approach. A cornerstone of achieving this task is a corpus of relationships between plants and chemicals.

RESULTS

In this study, we first constructed a corpus for plant and chemical entities and for the relationships between them. The corpus contains 267 plant entities, 475 chemical entities, and 1,007 plant-chemical relationships (550 and 457 positive and negative relationships, respectively), which are drawn from 377 sentences in 245 PubMed abstracts. Inter-annotator agreement scores for the corpus among three annotators were measured. The simple percent agreement scores for entities and trigger words for the relationships were 99.6 and 94.8 %, respectively, and the overall kappa score for the classification of positive and negative relationships was 79.8 %. We also developed a rule-based model to automatically extract such plant-chemical relationships. When we evaluated the rule-based model using the corpus and randomly selected biomedical articles, overall F-scores of 68.0 and 61.8 % were achieved, respectively.

CONCLUSION

We expect that the corpus for plant-chemical relationships will be a useful resource for enhancing plant research. The corpus is available at http://combio.gist.ac.kr/plantchemicalcorpus .

摘要

背景

植物是人类以多种方式消费的天然产物，包括作为食物和药物。它们在治疗疾病方面有着悠久的经验历史，副作用相对较少。基于这些优势，人们进行了许多研究来验证植物治疗疾病的有效性。了解植物中所含的化学物质至关重要，因为这些化学物质可以调节作为致病关键因素的蛋白质的活性。随着诸如PubMed等各种数据库中大量生物医学文献的积累，如果我们应用文本挖掘方法，就有可能大规模自动提取植物与化学物质之间的关系。实现这项任务的一个基石是植物与化学物质之间关系的语料库。

结果

在本研究中，我们首先构建了一个关于植物和化学实体及其之间关系的语料库。该语料库包含267个植物实体、475个化学实体以及1007种植物 - 化学关系（分别为550个正关系和457个负关系），这些关系来自245篇PubMed摘要中的377个句子。测量了三位注释者之间该语料库的注释者间一致性得分。实体和关系触发词的简单百分比一致性得分分别为99.6%和94.8%，正负关系分类的总体kappa得分为79.8%。我们还开发了一个基于规则的模型来自动提取此类植物 - 化学关系。当我们使用该语料库和随机选择的生物医学文章评估基于规则的模型时，总体F分数分别达到了68.0%和61.8%。

结论

我们期望植物 - 化学关系语料库将成为加强植物研究的有用资源。该语料库可在http://combio.gist.ac.kr/plantchemicalcorpus获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c0/5029005/74dff0a78634/12859_2016_1249_Fig2_HTML.jpg

相似文献

A corpus for plant-chemical relationships in the biomedical domain.一个用于生物医学领域植物与化学物质关系的语料库。

BMC Bioinformatics. 2016 Sep 20;17:386. doi: 10.1186/s12859-016-1249-5.

A corpus of plant-disease relations in the biomedical domain.生物医学领域中的植物疾病关系语料库。

PLoS One. 2019 Aug 28;14(8):e0221582. doi: 10.1371/journal.pone.0221582. eCollection 2019.

A method for named entity normalization in biomedical articles: application to diseases and plants.一种生物医学文章中命名实体规范化的方法：应用于疾病和植物

BMC Bioinformatics. 2017 Oct 13;18(1):451. doi: 10.1186/s12859-017-1857-8.

BioCreative V CDR task corpus: a resource for chemical disease relation extraction.生物创意V化学疾病关系提取任务语料库：化学疾病关系提取的资源。

Database (Oxford). 2016 May 9;2016. doi: 10.1093/database/baw068. Print 2016.

Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task.评估生物医学关系抽取的技术现状：生物创意V化学-疾病关系（CDR）任务概述。

Database (Oxford). 2016 Mar 19;2016. doi: 10.1093/database/baw032. Print 2016.

PhenoDEF: a corpus for annotating sentences with information of phenotype definitions in biomedical literature.PhenoDEF：一个用于在生物医学文献中注释具有表型定义信息的句子的语料库。

J Biomed Semantics. 2022 Jun 11;13(1):17. doi: 10.1186/s13326-022-00272-6.

The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships.欧盟不良反应数据库：标注药物、疾病、靶点及其相互关系。

J Biomed Inform. 2012 Oct;45(5):879-84. doi: 10.1016/j.jbi.2012.04.004. Epub 2012 Apr 25.

On the creation of a clinical gold standard corpus in Spanish: Mining adverse drug reactions.关于创建西班牙语临床金标准语料库：挖掘药物不良反应

J Biomed Inform. 2015 Aug;56:318-32. doi: 10.1016/j.jbi.2015.06.016. Epub 2015 Jun 30.

miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.miRiaD：一种用于检测微小RNA与疾病关联的文本挖掘工具。

J Biomed Semantics. 2016 Apr 29;7(1):9. doi: 10.1186/s13326-015-0044-y.

An annotated corpus from biomedical articles to construct a drug-food interaction database.一个来自生物医学文章的带注释语料库，用于构建药物-食物相互作用数据库。

J Biomed Inform. 2022 Feb;126:103985. doi: 10.1016/j.jbi.2022.103985. Epub 2022 Jan 7.

引用本文的文献

PotatoG-DKB: a potato gene-disease knowledge base mined from biological literature.马铃薯基因-疾病知识库（PotatoG-DKB）：从生物文献中挖掘的马铃薯基因-疾病知识库。

PeerJ. 2024 Oct 3;12:e18202. doi: 10.7717/peerj.18202. eCollection 2024.

Plant phenotype relationship corpus for biomedical relationships between plants and phenotypes.植物表型关系语料库，用于描述植物和表型之间的生物医学关系。

Sci Data. 2022 May 26;9(1):235. doi: 10.1038/s41597-022-01350-1.

Extracting knowledge networks from plant scientific literature: potato tuber flesh color as an exemplary trait.从植物科学文献中提取知识网络：以马铃薯块茎颜色为例证特征。

BMC Plant Biol. 2021 Apr 24;21(1):198. doi: 10.1186/s12870-021-02943-5.

An ANN model for the differential diagnosis of tuberculosis and sarcoidosis.用于肺结核与结节病鉴别诊断的人工神经网络模型

Bioinformation. 2020 Jul 31;16(7):539-546. doi: 10.6026/97320630016539. eCollection 2020.

Collaborative relation annotation and quality analysis in Markyt environment.马克提环境中的协作关系标注与质量分析。

Database (Oxford). 2017 Jan 1;2017. doi: 10.1093/database/bax090.

本文引用的文献

HIGH-PRECISION BIOLOGICAL EVENT EXTRACTION: EFFECTS OF SYSTEM AND OF DATA.高精度生物事件提取：系统与数据的影响

Comput Intell. 2011 Nov;27(4):681-701. doi: 10.1111/j.1467-8640.2011.00405.x.

tmChem: a high performance approach for chemical named entity recognition and normalization.tmChem：一种用于化学命名实体识别和标准化的高性能方法。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S3. doi: 10.1186/1758-2946-7-S1-S3. eCollection 2015.

The CHEMDNER corpus of chemicals and drugs and its annotation principles.CHEMDNER 化学物质和药物语料库及其标注原则。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2. doi: 10.1186/1758-2946-7-S1-S2. eCollection 2015.

Integrated text mining and chemoinformatics analysis associates diet to health benefit at molecular level.整合文本挖掘和化学信息学分析在分子水平上揭示饮食与健康益处之间的关联。

PLoS Comput Biol. 2014 Jan;10(1):e1003432. doi: 10.1371/journal.pcbi.1003432. Epub 2014 Jan 16.

BioC: a minimalist approach to interoperability for biomedical text processing.BioC：一种用于生物医学文本处理的最小互操作方法。

Database (Oxford). 2013 Sep 18;2013:bat064. doi: 10.1093/database/bat064. Print 2013.

DNorm: disease name normalization with pairwise learning to rank.DNorm：基于对分学习排序的疾病名称标准化。

Bioinformatics. 2013 Nov 15;29(22):2909-17. doi: 10.1093/bioinformatics/btt474. Epub 2013 Aug 21.

The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text.用于快速准确识别文本中分类名称的物种和生物体资源。

PLoS One. 2013 Jun 18;8(6):e65390. doi: 10.1371/journal.pone.0065390. Print 2013.

PubTator: a web-based text mining tool for assisting biocuration.PubTator：一个用于辅助生物注释的基于网络的文本挖掘工具。

Nucleic Acids Res. 2013 Jul;41(Web Server issue):W518-22. doi: 10.1093/nar/gkt441. Epub 2013 May 22.

Ganghwaljetongyeum, an anti-arthritic remedy, attenuates synoviocyte proliferation and reduces the production of proinflammatory mediators in macrophages: the therapeutic effect of GHJTY on rheumatoid arthritis.甘华姜通络液，一种抗关节炎的药物，可减弱滑膜细胞增殖并减少巨噬细胞中促炎介质的产生：GHJTY 对类风湿关节炎的治疗作用。

BMC Complement Altern Med. 2013 Feb 26;13:47. doi: 10.1186/1472-6882-13-47.

TCMID: Traditional Chinese Medicine integrative database for herb molecular mechanism analysis.TCMID：中药整合数据库用于草药分子机制分析。

Nucleic Acids Res. 2013 Jan;41(Database issue):D1089-95. doi: 10.1093/nar/gks1100. Epub 2012 Nov 29.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一个用于生物医学领域植物与化学物质关系的语料库。

A corpus for plant-chemical relationships in the biomedical domain.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献