使用多个部分重叠语料库进行广泛的生物医学事件抽取。

Wide coverage biomedical event extraction using multiple partially overlapping corpora.

机构信息

The National Centre for Text Mining and School of Computer Science, Manchester Institute of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK.

出版信息

BMC Bioinformatics. 2013 Jun 3;14:175. doi: 10.1186/1471-2105-14-175.

DOI:10.1186/1471-2105-14-175

PMID:23731785

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3680179/

Abstract

BACKGROUND

Biomedical events are key to understanding physiological processes and disease, and wide coverage extraction is required for comprehensive automatic analysis of statements describing biomedical systems in the literature. In turn, the training and evaluation of extraction methods requires manually annotated corpora. However, as manual annotation is time-consuming and expensive, any single event-annotated corpus can only cover a limited number of semantic types. Although combined use of several such corpora could potentially allow an extraction system to achieve broad semantic coverage, there has been little research into learning from multiple corpora with partially overlapping semantic annotation scopes.

RESULTS

We propose a method for learning from multiple corpora with partial semantic annotation overlap, and implement this method to improve our existing event extraction system, EventMine. An evaluation using seven event annotated corpora, including 65 event types in total, shows that learning from overlapping corpora can produce a single, corpus-independent, wide coverage extraction system that outperforms systems trained on single corpora and exceeds previously reported results on two established event extraction tasks from the BioNLP Shared Task 2011.

CONCLUSIONS

The proposed method allows the training of a wide-coverage, state-of-the-art event extraction system from multiple corpora with partial semantic annotation overlap. The resulting single model makes broad-coverage extraction straightforward in practice by removing the need to either select a subset of compatible corpora or semantic types, or to merge results from several models trained on different individual corpora. Multi-corpus learning also allows annotation efforts to focus on covering additional semantic types, rather than aiming for exhaustive coverage in any single annotation effort, or extending the coverage of semantic types annotated in existing corpora.

摘要

背景

生物医学事件是理解生理过程和疾病的关键，需要广泛的覆盖提取，以便对文献中描述生物医学系统的语句进行全面的自动分析。反过来，提取方法的培训和评估需要人工注释语料库。然而，由于人工注释既费时又昂贵，因此任何单个事件注释语料库只能覆盖有限数量的语义类型。虽然可以结合使用多个语料库，从而使提取系统有可能实现广泛的语义覆盖，但对于利用具有部分重叠语义注释范围的多个语料库进行学习的研究却很少。

结果

我们提出了一种从具有部分语义注释重叠的多个语料库中学习的方法，并将其实现为改进现有的事件提取系统 EventMine。使用七个事件注释语料库进行评估，包括总共 65 个事件类型，表明从重叠语料库中学习可以产生单个、与语料库无关的、广泛的覆盖范围提取系统，该系统优于在单个语料库上训练的系统，并超过了在 2011 年生物自然语言处理共享任务中两个既定事件提取任务上的先前报告结果。

结论

所提出的方法允许从具有部分语义注释重叠的多个语料库中训练一个广泛覆盖的、最先进的事件提取系统。由此产生的单一模型通过消除对兼容语料库或语义类型的子集进行选择、或对在不同单个语料库上训练的几个模型的结果进行合并的需求，使得广泛的覆盖提取在实践中变得简单。多语料库学习还允许注释工作集中于覆盖其他语义类型，而不是在任何单个注释工作中追求详尽的覆盖范围，或扩展现有语料库中注释的语义类型的覆盖范围。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b12/3680179/4f11f6ba78e2/1471-2105-14-175-1.jpg

相似文献

Wide coverage biomedical event extraction using multiple partially overlapping corpora.

BMC Bioinformatics. 2013 Jun 3;14:175. doi: 10.1186/1471-2105-14-175.

Boosting automatic event extraction from the literature using domain adaptation and coreference resolution.

Bioinformatics. 2012 Jul 1;28(13):1759-65. doi: 10.1093/bioinformatics/bts237. Epub 2012 Apr 25.

Event-based text mining for biology and functional genomics.

Brief Funct Genomics. 2015 May;14(3):213-30. doi: 10.1093/bfgp/elu015. Epub 2014 Jun 6.

Enriching a biomedical event corpus with meta-knowledge annotation.

BMC Bioinformatics. 2011 Oct 10;12:393. doi: 10.1186/1471-2105-12-393.

A semi-supervised learning framework for biomedical event extraction based on hidden topics.

Artif Intell Med. 2015 May;64(1):51-8. doi: 10.1016/j.artmed.2015.03.004. Epub 2015 Apr 1.

Extracting semantically enriched events from biomedical literature.

BMC Bioinformatics. 2012 May 23;13:108. doi: 10.1186/1471-2105-13-108.

Pipelined biomedical event extraction rivaling joint learning.

Methods. 2024 Jun;226:9-18. doi: 10.1016/j.ymeth.2024.04.003. Epub 2024 Apr 9.

Multiple-level biomedical event trigger recognition with transfer learning.

BMC Bioinformatics. 2019 Sep 6;20(1):459. doi: 10.1186/s12859-019-3030-z.

BertSRC: transformer-based semantic relation classification.

BMC Med Inform Decis Mak. 2022 Sep 6;22(1):234. doi: 10.1186/s12911-022-01977-5.

Building a comprehensive syntactic and semantic corpus of Chinese clinical texts.

J Biomed Inform. 2017 May;69:203-217. doi: 10.1016/j.jbi.2017.04.006. Epub 2017 Apr 9.

引用本文的文献

Predicting potential target genes in molecular biology experiments using machine learning and multifaceted data sources.

iScience. 2024 Feb 23;27(3):109309. doi: 10.1016/j.isci.2024.109309. eCollection 2024 Mar 15.

A survey on clinical natural language processing in the United Kingdom from 2007 to 2022.

NPJ Digit Med. 2022 Dec 21;5(1):186. doi: 10.1038/s41746-022-00730-6.

Testing the reproducibility and robustness of the cancer biology literature by robot.

J R Soc Interface. 2022 Apr;19(189):20210821. doi: 10.1098/rsif.2021.0821. Epub 2022 Apr 6.

Unsupervised Event Graph Representation and Similarity Learning on Biomedical Literature.

Sensors (Basel). 2021 Dec 21;22(1):3. doi: 10.3390/s22010003.

DeepEventMine: end-to-end neural nested event extraction from biomedical texts.

Bioinformatics. 2020 Dec 8;36(19):4910-4917. doi: 10.1093/bioinformatics/btaa540.

Automating the Capture of Structured Pathology Data for Prostate Cancer Clinical Care and Research.

JCO Clin Cancer Inform. 2019 Jul;3:1-8. doi: 10.1200/CCI.18.00084.

Bridging semantics and syntax with graph algorithms-state-of-the-art of extracting biomedical relations.

Brief Bioinform. 2017 Jan;18(1):160-178. doi: 10.1093/bib/bbw001. Epub 2016 Feb 5.

An Overview of Biomolecular Event Extraction from Scientific Documents.

Comput Math Methods Med. 2015;2015:571381. doi: 10.1155/2015/571381. Epub 2015 Oct 26.

Overview of the Cancer Genetics and Pathway Curation tasks of BioNLP Shared Task 2013.

BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S2. doi: 10.1186/1471-2105-16-S10-S2. Epub 2015 Jul 13.

Adaptable, high recall, event extraction system with minimal configuration.

BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S7. doi: 10.1186/1471-2105-16-S10-S7. Epub 2015 Jul 13.

本文引用的文献

Event extraction across multiple levels of biological organization.

Bioinformatics. 2012 Sep 15;28(18):i575-i581. doi: 10.1093/bioinformatics/bts407.

Combining joint models for biomedical event extraction.

BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S9. doi: 10.1186/1471-2105-13-S11-S9.

Biomedical event extraction from abstracts and full papers using search-based structured prediction.

BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S5. doi: 10.1186/1471-2105-13-S11-S5.

University of Turku in the BioNLP'11 Shared Task.

BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S4. doi: 10.1186/1471-2105-13-S11-S4.

Overview of the ID, EPI and REL tasks of BioNLP Shared Task 2011.

BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S2. doi: 10.1186/1471-2105-13-S11-S2.

The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011.

BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S1. doi: 10.1186/1471-2105-13-S11-S1.

Extracting semantically enriched events from biomedical literature.

BMC Bioinformatics. 2012 May 23;13:108. doi: 10.1186/1471-2105-13-108.

Boosting automatic event extraction from the literature using domain adaptation and coreference resolution.

Bioinformatics. 2012 Jul 1;28(13):1759-65. doi: 10.1093/bioinformatics/bts237. Epub 2012 Apr 25.

Event extraction for DNA methylation.

J Biomed Semantics. 2011 Oct 6;2 Suppl 5(Suppl 5):S2. doi: 10.1186/2041-1480-2-S5-S2.

Enriching a biomedical event corpus with meta-knowledge annotation.

BMC Bioinformatics. 2011 Oct 10;12:393. doi: 10.1186/1471-2105-12-393.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用多个部分重叠语料库进行广泛的生物医学事件抽取。

Wide coverage biomedical event extraction using multiple partially overlapping corpora.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献