Wang Yue, Zheng Kai, Xu Hua, Mei Qiaozhu
Department of EECS, University of Michigan, Ann Arbor, MI, USA.
Department of Informatics, University of California, Irvine, CA, USA.
AMIA Annu Symp Proc. 2017 Feb 10;2016:2062-2071. eCollection 2016.
Resolving word ambiguity in clinical text is critical for many natural language processing applications. Effective word sense disambiguation (WSD) systems rely on training a machine learning based classifier with abundant clinical text that is accurately annotated, the creation of which can be costly and time-consuming. We describe a double-loop interactive machine learning process, named ReQ-ReC (ReQuery-ReClassify), and demonstrate its effectiveness on multiple evaluation corpora. Using ReQ-ReC, a human expert first uses her domain knowledge to include sense-specific contextual words into the ReQuery loops and searches for instances relevant to the senses. Then, in the ReClassify loops, the expert only annotates the most ambiguous instances found by the current WSD model. Even with machine-generated queries only, the framework is comparable with or faster than current active learning methods in building WSD models. The process can be further accelerated when human experts use their domain knowledge to guide the search process.
解决临床文本中的词歧义问题对于许多自然语言处理应用至关重要。有效的词义消歧(WSD)系统依赖于使用大量经过准确标注的临床文本训练基于机器学习的分类器,而创建这些标注文本可能成本高昂且耗时。我们描述了一种双循环交互式机器学习过程,名为ReQ-ReC(重新查询-重新分类),并在多个评估语料库上证明了其有效性。使用ReQ-ReC时,人类专家首先利用其领域知识将特定词义的上下文词纳入重新查询循环,并搜索与这些词义相关的实例。然后,在重新分类循环中,专家仅对当前WSD模型发现的最具歧义的实例进行标注。即使仅使用机器生成的查询,该框架在构建WSD模型方面也与当前的主动学习方法相当或更快。当人类专家利用其领域知识指导搜索过程时,该过程可以进一步加速。