Velldal Erik
Department of Informatics, University of Oslo, PO Box 1080 Blindern, 0316 Oslo, Norway.
J Biomed Semantics. 2011 Oct 6;2 Suppl 5(Suppl 5):S7. doi: 10.1186/2041-1480-2-S5-S7.
This paper presents a novel approach to the problem of hedge detection, which involves identifying so-called hedge cues for labeling sentences as certain or uncertain. This is the classification problem for Task 1 of the CoNLL-2010 Shared Task, which focuses on hedging in the biomedical domain. We here propose to view hedge detection as a simple disambiguation problem, restricted to words that have previously been observed as hedge cues. As the feature space for the classifier is still very large, we also perform experiments with dimensionality reduction using the method of random indexing.
The SVM-based classifiers developed in this paper achieves the best published results so far for sentence-level uncertainty prediction on the CoNLL-2010 Shared Task test data. We also show that the technique of random indexing can be successfully applied for reducing the dimensionality of the original feature space by several orders of magnitude, without sacrificing classifier performance.
This paper introduces a simplified approach to detecting speculation or uncertainty in text, focusing on the biomedical domain. Evaluated at the sentence-level, our SVM-based classifiers achieve the best published results so far. We also show that the feature space can be aggressively compressed using random indexing while still maintaining comparable classifier performance.
本文提出了一种解决模糊限制语检测问题的新方法,该方法涉及识别所谓的模糊限制语线索,以便将句子标记为确定或不确定。这是2010年CoNLL共享任务中任务1的分类问题,该任务聚焦于生物医学领域的模糊限制语。我们在此提议将模糊限制语检测视为一个简单的消歧问题,仅限于那些之前被观察为模糊限制语线索的词汇。由于分类器的特征空间仍然非常大,我们还使用随机索引方法进行降维实验。
本文开发的基于支持向量机的分类器在2010年CoNLL共享任务测试数据的句子级不确定性预测方面取得了目前已发表的最佳结果。我们还表明,随机索引技术可以成功应用于将原始特征空间的维度降低几个数量级,而不牺牲分类器性能。
本文介绍了一种简化的方法来检测文本中的推测或不确定性,重点关注生物医学领域。在句子级别进行评估时,我们基于支持向量机的分类器取得了目前已发表的最佳结果。我们还表明,使用随机索引可以大幅压缩特征空间,同时仍保持相当的分类器性能。