用于改进从生物医学文献中提取基于模式的信息的简单技巧。

Simple tricks for improving pattern-based information extraction from the biomedical literature.

作者信息

Nguyen Quang Long, Tikk Domonkos, Leser Ulf

机构信息

Knowledge Management in Bioinformatics, Department for Computer Science, Humboldt-Universität zu Berlin, Germany.

出版信息

J Biomed Semantics. 2010 Sep 24;1(1):9. doi: 10.1186/2041-1480-1-9.

DOI:10.1186/2041-1480-1-9

PMID:20868467

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2955645/

Abstract

BACKGROUND

Pattern-based approaches to relation extraction have shown very good results in many areas of biomedical text mining. However, defining the right set of patterns is difficult; approaches are either manual, incurring high cost, or automatic, often resulting in large sets of noisy patterns.

RESULTS

We propose several techniques for filtering sets of automatically generated patterns and analyze their effectiveness for different extraction tasks, as defined in the recent BioNLP 2009 shared task. We focus on simple methods that only take into account the complexity of the pattern and the complexity of the texts the patterns are applied to. We show that our techniques, despite their simplicity, yield large improvements in all tasks we analyzed. For instance, they raise the F-score for the task of extraction gene expression events from 24.8% to 51.9%.

CONCLUSIONS

Already very simple filtering techniques may improve the F-score of an information extraction method based on automatically generated patterns significantly. Furthermore, the application of such methods yields a considerable speed-up, as fewer matches need to be analysed. Due to their simplicity, the proposed filtering techniques also should be applicable to other methods using linguistic patterns for information extraction.

摘要

背景

基于模式的关系提取方法在生物医学文本挖掘的许多领域都取得了很好的成果。然而，定义合适的模式集很困难；方法要么是手动的，成本高昂，要么是自动的，往往会产生大量噪声模式。

结果

我们提出了几种用于过滤自动生成的模式集的技术，并分析了它们在最近的2009年生物自然语言处理共享任务中定义的不同提取任务中的有效性。我们专注于仅考虑模式的复杂性以及应用模式的文本的复杂性的简单方法。我们表明，我们的技术尽管简单，但在我们分析的所有任务中都带来了很大的改进。例如，它们将从文本中提取基因表达事件任务的F值从24.8%提高到了51.9%。

结论

非常简单的过滤技术就可能显著提高基于自动生成模式的信息提取方法的F值。此外，应用这些方法可以显著加快速度，因为需要分析的匹配项更少。由于其简单性，所提出的过滤技术也应该适用于其他使用语言模式进行信息提取的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dffd/2955645/31da40d40f4e/2041-1480-1-9-1.jpg

相似文献

Simple tricks for improving pattern-based information extraction from the biomedical literature.用于改进从生物医学文献中提取基于模式的信息的简单技巧。

J Biomed Semantics. 2010 Sep 24;1(1):9. doi: 10.1186/2041-1480-1-9.

Biomedical event extraction based on GRU integrating attention mechanism.基于 GRU 集成注意力机制的生物医学事件抽取。

BMC Bioinformatics. 2018 Aug 13;19(Suppl 9):285. doi: 10.1186/s12859-018-2275-2.

Overview of the ID, EPI and REL tasks of BioNLP Shared Task 2011.生物自然语言处理共享任务 2011 的 ID、EPI 和 REL 任务概述。

BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S2. doi: 10.1186/1471-2105-13-S11-S2.

Improving chemical disease relation extraction with rich features and weakly labeled data.利用丰富特征和弱标记数据改进化学疾病关系提取

J Cheminform. 2016 Oct 7;8:53. doi: 10.1186/s13321-016-0165-z. eCollection 2016.

Filtering large-scale event collections using a combination of supervised and unsupervised learning for event trigger classification.结合监督学习和无监督学习对事件触发分类进行大规模事件集合过滤。

J Biomed Semantics. 2016 May 11;7:27. doi: 10.1186/s13326-016-0070-4. eCollection 2016.

Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task.评估生物医学关系抽取的技术现状：生物创意V化学-疾病关系（CDR）任务概述。

Database (Oxford). 2016 Mar 19;2016. doi: 10.1093/database/baw032. Print 2016.

TrigNER: automatically optimized biomedical event trigger recognition on scientific documents.TrigNER：科学文档上自动优化的生物医学事件触发识别

Source Code Biol Med. 2014 Jan 8;9(1):1. doi: 10.1186/1751-0473-9-1.

Detection and categorization of bacteria habitats using shallow linguistic analysis.利用浅层语言分析检测和分类细菌栖息地

BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S5. doi: 10.1186/1471-2105-16-S10-S5. Epub 2015 Jul 13.

Literature mining of protein-residue associations with graph rules learned through distant supervision.通过远程监督学习的图形规则对蛋白质-残基关联进行文献挖掘。

J Biomed Semantics. 2012 Oct 5;3 Suppl 3(Suppl 3):S2. doi: 10.1186/2041-1480-3-S3-S2.

Evaluation of BioCreAtIvE assessment of task 2.生物创意任务2评估的评价

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-6-S1-S16. Epub 2005 May 24.

引用本文的文献

ANDDigest: a new web-based module of ANDSystem for the search of knowledge in the scientific literature.ANDDigest：ANDSystem 的一个新的基于网络的模块，用于在科学文献中搜索知识。

BMC Bioinformatics. 2020 Sep 14;21(Suppl 11):228. doi: 10.1186/s12859-020-03557-8.

Automatic extraction of biomolecular interactions: an empirical approach.生物分子相互作用的自动提取：一种经验方法。

BMC Bioinformatics. 2013 Jul 24;14:234. doi: 10.1186/1471-2105-14-234.

Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information.利用语义信息为比较毒理学基因组数据库对 PubMed 文章进行优先级排序。

Database (Oxford). 2012 Nov 17;2012:bas042. doi: 10.1093/database/bas042. Print 2012.

Determining word sequence variation patterns in clinical documents using multiple sequence alignment.使用多序列比对确定临床文档中的单词序列变异模式。

AMIA Annu Symp Proc. 2011;2011:934-43. Epub 2011 Oct 22.

本文引用的文献

A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature.从文献中提取蛋白质-蛋白质相互作用的核方法综合基准测试

PLoS Comput Biol. 2010 Jul 1;6(7):e1000837. doi: 10.1371/journal.pcbi.1000837.

Complex event extraction at PubMed scale.在 PubMed 规模上进行复杂事件抽取。

Bioinformatics. 2010 Jun 15;26(12):i382-90. doi: 10.1093/bioinformatics/btq180.

A realistic assessment of methods for extracting gene/protein interactions from free text.从自由文本中提取基因/蛋白质相互作用方法的现实评估。

BMC Bioinformatics. 2009 Jul 28;10:233. doi: 10.1186/1471-2105-10-233.

Overview of the protein-protein interaction annotation extraction task of BioCreative II.生物创意II蛋白质-蛋白质相互作用注释提取任务概述。

Genome Biol. 2008;9 Suppl 2(Suppl 2):S4. doi: 10.1186/gb-2008-9-s2-s4. Epub 2008 Sep 1.

Gene mention normalization and interaction extraction with context models and sentence motifs.基于上下文模型和句子模式的基因提及规范化与相互作用提取

Genome Biol. 2008;9 Suppl 2(Suppl 2):S14. doi: 10.1186/gb-2008-9-s2-s14. Epub 2008 Sep 1.

OpenDMAP: an open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression.OpenDMAP：一个开源的、由本体驱动的概念分析引擎，应用于捕获有关蛋白质转运、蛋白质相互作用和细胞类型特异性基因表达的知识。

BMC Bioinformatics. 2008 Jan 31;9:78. doi: 10.1186/1471-2105-9-78.

Corpus annotation for mining biomedical events from literature.用于从文献中挖掘生物医学事件的语料库标注。

BMC Bioinformatics. 2008 Jan 8;9:10. doi: 10.1186/1471-2105-9-10.

Frontiers of biomedical text mining: current progress.生物医学文本挖掘前沿：当前进展

Brief Bioinform. 2007 Sep;8(5):358-75. doi: 10.1093/bib/bbm045. Epub 2007 Oct 30.

RelEx--relation extraction using dependency parse trees.RelEx——使用依存句法分析树进行关系抽取。

Bioinformatics. 2007 Feb 1;23(3):365-71. doi: 10.1093/bioinformatics/btl616. Epub 2006 Dec 1.

AliBaba: PubMed as a graph.阿里巴巴：作为图的PubMed。

Bioinformatics. 2006 Oct 1;22(19):2444-5. doi: 10.1093/bioinformatics/btl408. Epub 2006 Jul 26.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于改进从生物医学文献中提取基于模式的信息的简单技巧。

Simple tricks for improving pattern-based information extraction from the biomedical literature.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献