Suppr超能文献

生物医学文本中的对冲范围检测:一种基于依存关系的有效方法。

Hedge Scope Detection in Biomedical Texts: An Effective Dependency-Based Method.

作者信息

Zhou Huiwei, Deng Huijie, Huang Degen, Zhu Minling

机构信息

School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning, China.

School of Computer, Beijing Information Science and Technology University, Beijing, China.

出版信息

PLoS One. 2015 Jul 28;10(7):e0133715. doi: 10.1371/journal.pone.0133715. eCollection 2015.

Abstract

Hedge detection is used to distinguish uncertain information from facts, which is of essential importance in biomedical information extraction. The task of hedge detection is often divided into two subtasks: detecting uncertain cues and their linguistic scope. Hedge scope is a sequence of tokens including the hedge cue in a sentence. Previous hedge scope detection methods usually take all tokens in a sentence as candidate boundaries, which inevitably generate a large number of negatives for classifiers. The imbalanced instances seriously mislead classifiers and result in lower performance. This paper proposes a dependency-based candidate boundary selection method (DCBS), which selects the most likely tokens as candidate boundaries and removes the exceptional tokens which have less potential to improve the performance based on dependency tree. In addition, we employ the composite kernel to integrate lexical and syntactic information and demonstrate the effectiveness of structured syntactic features for hedge scope detection. Experiments on the CoNLL-2010 Shared Task corpus show that our method achieves 71.92% F1-score on the golden standard cues, which is 4.11% higher than the system without using DCBS. Although the candidate boundary selection method is only evaluated on hedge scope detection here, it can be popularized to other kinds of scope learning tasks.

摘要

模糊限制语检测用于区分不确定信息和事实,这在生物医学信息提取中至关重要。模糊限制语检测任务通常分为两个子任务:检测不确定线索及其语言范围。模糊限制语范围是句子中包含模糊限制语线索的一系列词元。先前的模糊限制语范围检测方法通常将句子中的所有词元作为候选边界,这不可避免地为分类器生成大量负样本。不均衡的实例严重误导分类器并导致性能下降。本文提出了一种基于依存关系的候选边界选择方法(DCBS),该方法基于依存关系树选择最有可能的词元作为候选边界,并去除那些提升性能潜力较小的异常词元。此外,我们采用复合核来整合词汇和句法信息,并证明结构化句法特征在模糊限制语范围检测中的有效性。在CoNLL - 2010共享任务语料库上的实验表明,我们的方法在黄金标准线索上的F1值达到71.92%,比未使用DCBS的系统高出4.11%。尽管候选边界选择方法在此仅针对模糊限制语范围检测进行评估,但它可以推广到其他类型的范围学习任务。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验