Moon Sungrim, Berster Bjoern-Toby, Xu Hua, Cohen Trevor
The University of Texas School of Biomedical Informatics at Houston, Houston, TX.
Drchrono, Mountain View, CA.
AMIA Annu Symp Proc. 2013 Nov 16;2013:1007-16. eCollection 2013.
Automated Word Sense Disambiguation in clinical documents is a prerequisite to accurate extraction of medical information. Emerging methods utilizing hyperdimensional computing present new approaches to this problem. In this paper, we evaluate one such approach, the Binary Spatter Code Word Sense Disambiguation algorithm, on 50 ambiguous abbreviation sets derived from clinical notes. This algorithm uses reversible vector transformations to encode ambiguous terms and their context-specific senses into vectors representing surrounding terms. The sense for a new context is then inferred from vectors representing the terms it contains. One-to-one BSC-WSD achieves average accuracy of 94.55% when considering the orientation and distance of neighboring terms relative to the target abbreviation, outperforming Support Vector Machine and Naïve Bayes classifiers. Furthermore, it is practical to deal with all 50 abbreviations in an identical manner using a single one-to-many BSC-WSD model with average accuracy of 93.91%, which is not possible with common machine learning algorithms.
临床文档中的自动词义消歧是准确提取医学信息的前提条件。利用超维计算的新兴方法为解决这个问题提供了新途径。在本文中,我们在从临床记录中提取的50个歧义缩写集上评估了一种这样的方法,即二进制飞溅码词义消歧算法。该算法使用可逆向量变换将歧义术语及其特定上下文含义编码为表示周围术语的向量。然后从表示新上下文中所包含术语的向量中推断出该上下文的含义。当考虑相邻术语相对于目标缩写的方向和距离时,一对一的二进制飞溅码词义消歧算法的平均准确率达到94.55%,优于支持向量机和朴素贝叶斯分类器。此外,使用单个一对多二进制飞溅码词义消歧模型以相同方式处理所有50个缩写是可行的,平均准确率为93.91%,而这对于常见的机器学习算法来说是不可能的。