Al-Mubaid Hisham, Gungu Sandeep
University of Houston-Clear Lake, Houston, TX 77058, USA.
ScientificWorldJournal. 2012;2012:949247. doi: 10.1100/2012/949247. Epub 2012 May 1.
In the biomedical domain, word sense ambiguity is a widely spread problem with bioinformatics research effort devoted to it being not commensurate and allowing for more development. This paper presents and evaluates a learning-based approach for sense disambiguation within the biomedical domain. The main limitation with supervised methods is the need for a corpus of manually disambiguated instances of the ambiguous words. However, the advances in automatic text annotation and tagging techniques with the help of the plethora of knowledge sources like ontologies and text literature in the biomedical domain will help lessen this limitation. The proposed method utilizes the interaction model (mutual information) between the context words and the senses of the target word to induce reliable learning models for sense disambiguation. The method has been evaluated with the benchmark dataset NLM-WSD with various settings and in biomedical entity species disambiguation. The evaluation results showed that the approach is very competitive and outperforms recently reported results of other published techniques.
在生物医学领域,词义模糊是一个普遍存在的问题,致力于该问题的生物信息学研究工作并不相称,仍有很大的发展空间。本文提出并评估了一种基于学习的生物医学领域词义消歧方法。监督方法的主要局限性在于需要一个包含歧义单词手动消歧实例的语料库。然而,借助生物医学领域中诸如本体和文本等大量知识源的自动文本注释和标记技术的进步,将有助于减轻这一局限性。所提出的方法利用上下文单词与目标词词义之间的交互模型(互信息)来诱导可靠的词义消歧学习模型。该方法已使用基准数据集NLM-WSD在各种设置下以及在生物医学实体物种消歧中进行了评估。评估结果表明,该方法具有很强的竞争力,优于最近报道的其他已发表技术的结果。