Suppr超能文献

监督式词义消歧的多方面比较研究

A multi-aspect comparison study of supervised word sense disambiguation.

作者信息

Liu Hongfang, Teller Virginia, Friedman Carol

机构信息

Department of Information Systems, University of Maryland at Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA.

出版信息

J Am Med Inform Assoc. 2004 Jul-Aug;11(4):320-31. doi: 10.1197/jamia.M1533. Epub 2004 Apr 2.

Abstract

OBJECTIVE

The aim of this study was to investigate relations among different aspects in supervised word sense disambiguation (WSD; supervised machine learning for disambiguating the sense of a term in a context) and compare supervised WSD in the biomedical domain with that in the general English domain.

METHODS

The study involves three data sets (a biomedical abbreviation data set, a general biomedical term data set, and a general English data set). The authors implemented three machine-learning algorithms, including (1) naïve Bayes (NBL) and decision lists (TDLL), (2) their adaptation of decision lists (ODLL), and (3) their mixed supervised learning (MSL). There were six feature representations (various combinations of collocations, bag of words, oriented bag of words, etc.) and five window sizes (2, 4, 6, 8, and 10).

RESULTS

Supervised WSD is suitable only when there are enough sense-tagged instances with at least a few dozens of instances for each sense. Collocations combined with neighboring words are appropriate selections for the context. For terms with unrelated biomedical senses, a large window size such as the whole paragraph should be used, while for general English words a moderate window size between 4 and 10 should be used. The performance of the authors' implementation of decision list classifiers for abbreviations was better than that of traditional decision list classifiers. However, the opposite held for the other two sets. Also, the authors' mixed supervised learning was stable and generally better than others for all sets.

CONCLUSION

From this study, it was found that different aspects of supervised WSD depend on each other. The experiment method presented in the study can be used to select the best supervised WSD classifier for each ambiguous term.

摘要

目的

本研究旨在调查监督式词义消歧(WSD;用于在上下文中消除术语歧义的监督式机器学习)不同方面之间的关系,并将生物医学领域的监督式WSD与通用英语领域的进行比较。

方法

该研究涉及三个数据集(一个生物医学缩写数据集、一个通用生物医学术语数据集和一个通用英语数据集)。作者实现了三种机器学习算法,包括(1)朴素贝叶斯(NBL)和决策列表(TDLL),(2)他们对决策列表的改编(ODLL),以及(3)他们的混合监督学习(MSL)。有六种特征表示(搭配、词袋、定向词袋等的各种组合)和五种窗口大小(2、4、6、8和10)。

结果

只有当有足够的带有词义标签的实例,且每个词义至少有几十个实例时,监督式WSD才适用。搭配与相邻词结合是上下文的合适选择。对于具有不相关生物医学词义的术语,应使用较大的窗口大小,如整个段落,而对于通用英语单词,应使用4到10之间的适中窗口大小。作者对缩写词的决策列表分类器的实现性能优于传统决策列表分类器。然而,对于其他两组情况则相反。此外,作者的混合监督学习是稳定的,并且在所有组中总体上优于其他方法。

结论

从这项研究中发现,监督式WSD的不同方面相互依赖。该研究中提出的实验方法可用于为每个歧义术语选择最佳的监督式WSD分类器。

相似文献

1
A multi-aspect comparison study of supervised word sense disambiguation.监督式词义消歧的多方面比较研究
J Am Med Inform Assoc. 2004 Jul-Aug;11(4):320-31. doi: 10.1197/jamia.M1533. Epub 2004 Apr 2.
4
Determining the difficulty of Word Sense Disambiguation.确定词义消歧的难度。
J Biomed Inform. 2014 Feb;47:83-90. doi: 10.1016/j.jbi.2013.09.009. Epub 2013 Sep 26.
10
A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time.实时临床缩写词消歧的初步研究
Appl Clin Inform. 2015 Jun 3;6(2):364-74. doi: 10.4338/ACI-2014-10-RA-0088. eCollection 2015.

引用本文的文献

6
Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings.基于知识的生物医学词汇语义消歧与神经概念嵌入
Proc IEEE Int Symp Bioinformatics Bioeng. 2017 Oct;2017:163-170. doi: 10.1109/BIBE.2017.00-61. Epub 2018 Jan 11.
10
A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time.实时临床缩写词消歧的初步研究
Appl Clin Inform. 2015 Jun 3;6(2):364-74. doi: 10.4338/ACI-2014-10-RA-0088. eCollection 2015.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验