半监督生物医学事件抽取方法。

Semi-supervised method for biomedical event extraction.

机构信息

School of Computer Science and Technology, Dalian University of Technology, Dalian, China.

出版信息

Proteome Sci. 2013 Nov 7;11(Suppl 1):S17. doi: 10.1186/1477-5956-11-S1-S17.

DOI:10.1186/1477-5956-11-S1-S17

PMID:24565105

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3909242/

Abstract

BACKGROUND

Biomedical extraction based on supervised machine learning still faces the problem that a limited labeled dataset does not saturate the learning method. Many supervised learning algorithms for bio-event extraction have been affected by the data sparseness.

METHODS

In this study, a semi-supervised method for combining labeled data with large scale of unlabeled data is presented to improve the performance of biomedical event extraction. We propose a set of rich feature vector, including a variety of syntactic features and semantic features, such as N-gram features, walk subsequence features, predicate argument structure (PAS) features, especially some new features derived from a strategy named Event Feature Coupling Generalization (EFCG). The EFCG algorithm can create useful event recognition features by making use of the correlation between two sorts of original features explored from the labeled data, while the correlation is computed with the help of massive amounts of unlabeled data. This introduced EFCG approach aims to solve the data sparse problem caused by limited tagging corpus, and enables the new features to cover much more event related information with better generalization properties.

RESULTS

The effectiveness of our event extraction system is evaluated on the datasets from the BioNLP Shared Task 2011 and PubMed. Experimental results demonstrate the state-of-the-art performance in the fine-grained biomedical information extraction task.

CONCLUSIONS

Limited labeled data could be combined with unlabeled data to tackle the data sparseness problem by means of our EFCG approach, and the classified capability of the model was enhanced through establishing a rich feature set by both labeled and unlabeled datasets. So this semi-supervised learning approach could go far towards improving the performance of the event extraction system. To the best of our knowledge, it was the first attempt at combining labeled and unlabeled data for tasks related biomedical event extraction.

摘要

背景

基于监督机器学习的生物医学提取仍然面临着有限的标记数据集不能使学习方法饱和的问题。许多用于生物事件提取的监督学习算法都受到了数据稀疏性的影响。

方法

在这项研究中，提出了一种将标记数据与大规模未标记数据相结合的半监督方法，以提高生物医学事件提取的性能。我们提出了一组丰富的特征向量，包括各种句法特征和语义特征，如 N 元特征、游走子序列特征、谓词谓词结构（PAS）特征，特别是一些从名为事件特征耦合泛化（EFCG）的策略中派生的新特征。EFCG 算法可以通过利用从标记数据中探索的两种原始特征之间的相关性来创建有用的事件识别特征，而相关性是通过大量未标记数据的帮助来计算的。这种引入的 EFCG 方法旨在解决由有限的标记语料库引起的数据稀疏问题，并使新特征具有更好的泛化特性，从而涵盖更多的与事件相关的信息。

结果

我们的事件提取系统在生物自然语言处理共享任务 2011 和 PubMed 数据集上进行了评估。实验结果证明了在细粒度生物医学信息提取任务中的最新性能。

结论

通过我们的 EFCG 方法，可以将有限的标记数据与未标记数据结合起来解决数据稀疏问题，并且通过使用标记和未标记数据集建立丰富的特征集来增强模型的分类能力。因此，这种半监督学习方法可以大大提高事件提取系统的性能。据我们所知，这是首次尝试将标记和未标记数据结合起来用于生物医学事件提取相关任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/271a/3909242/9c3e93946f00/1477-5956-11-S1-S17-1.jpg

相似文献

Semi-supervised method for biomedical event extraction.

Proteome Sci. 2013 Nov 7;11(Suppl 1):S17. doi: 10.1186/1477-5956-11-S1-S17.

Exploring semi-supervised variational autoencoders for biomedical relation extraction.

Methods. 2019 Aug 15;166:112-119. doi: 10.1016/j.ymeth.2019.02.021. Epub 2019 Feb 27.

Learning an enriched representation from unlabeled data for protein-protein interaction extraction.

BMC Bioinformatics. 2010 Apr 16;11 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2105-11-S2-S7.

Active learning for ontological event extraction incorporating named entity recognition and unknown word handling.

J Biomed Semantics. 2016 Apr 27;7:22. doi: 10.1186/s13326-016-0059-z. eCollection 2016.

Multi-class motor imagery EEG classification using collaborative representation-based semi-supervised extreme learning machine.

Med Biol Eng Comput. 2020 Sep;58(9):2119-2130. doi: 10.1007/s11517-020-02227-4. Epub 2020 Jul 16.

Filtering large-scale event collections using a combination of supervised and unsupervised learning for event trigger classification.

J Biomed Semantics. 2016 May 11;7:27. doi: 10.1186/s13326-016-0070-4. eCollection 2016.

A framework for semisupervised feature generation and its applications in biomedical literature mining.

IEEE/ACM Trans Comput Biol Bioinform. 2011 Mar-Apr;8(2):294-307. doi: 10.1109/TCBB.2010.99.

Semi-Supervised Recurrent Neural Network for Adverse Drug Reaction mention extraction.

BMC Bioinformatics. 2018 Jun 13;19(Suppl 8):212. doi: 10.1186/s12859-018-2192-4.

A semi-supervised learning framework for biomedical event extraction based on hidden topics.

Artif Intell Med. 2015 May;64(1):51-8. doi: 10.1016/j.artmed.2015.03.004. Epub 2015 Apr 1.

A hierarchical semi-supervised extreme learning machine method for EEG recognition.

Med Biol Eng Comput. 2019 Jan;57(1):147-157. doi: 10.1007/s11517-018-1875-3. Epub 2018 Jul 28.

引用本文的文献

An Overview of Biomolecular Event Extraction from Scientific Documents.

Comput Math Methods Med. 2015;2015:571381. doi: 10.1155/2015/571381. Epub 2015 Oct 26.

Semi-supervised Learning for the BioNLP Gene Regulation Network.

BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S4. doi: 10.1186/1471-2105-16-S10-S4. Epub 2015 Jul 13.

本文引用的文献

Word sense disambiguation for event trigger word detection in biomedicine.

BMC Bioinformatics. 2011 Mar 29;12 Suppl 2(Suppl 2):S4. doi: 10.1186/1471-2105-12-S2-S4.

A framework for semisupervised feature generation and its applications in biomedical literature mining.

IEEE/ACM Trans Comput Biol Bioinform. 2011 Mar-Apr;8(2):294-307. doi: 10.1109/TCBB.2010.99.

Event extraction for systems biology by text mining the literature.

Trends Biotechnol. 2010 Jul;28(7):381-90. doi: 10.1016/j.tibtech.2010.04.005. Epub 2010 Jun 1.

Learning an enriched representation from unlabeled data for protein-protein interaction extraction.

BMC Bioinformatics. 2010 Apr 16;11 Suppl 2(Suppl 2):S7. doi: 10.1186/1471-2105-11-S2-S7.

Event extraction with complex event classification using rich features.

J Bioinform Comput Biol. 2010 Feb;8(1):131-46. doi: 10.1142/s0219720010004586.

Incorporating rich background knowledge for gene named entity classification and recognition.

BMC Bioinformatics. 2009 Jul 17;10:223. doi: 10.1186/1471-2105-10-223.

Evaluating contributions of natural language parsers to protein-protein interaction extraction.

Bioinformatics. 2009 Feb 1;25(3):394-400. doi: 10.1093/bioinformatics/btn631. Epub 2008 Dec 9.

Corpus annotation for mining biomedical events from literature.

BMC Bioinformatics. 2008 Jan 8;9:10. doi: 10.1186/1471-2105-9-10.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

半监督生物医学事件抽取方法。

Semi-supervised method for biomedical event extraction.

机构信息

School of Computer Science and Technology, Dalian University of Technology, Dalian, China.