Rudniy Alex, Geller James, Song Min
NJIT, Newark, NJ.
AMIA Annu Symp Proc. 2010 Nov 13;2010:697-701.
Expansion of the UMLS is an important long-term research project. This paper proposes Shortest Path Edit Distance (SPED) as an algorithm for improving existing source-integration and auditing techniques. We use SPED as a string similarity measure for UMLS terms that are known to be synonyms because they are assigned to the same concept. We compare SPED with several other well known string matching algorithms using two UMLS samples as test bed. One of those samples is SNOMED-based. SPED transforms the task of calculating edit distance among two strings into a problem of finding a shortest path from a source to a destination in a node and link graph. In the algorithm, the two strings are used to construct the graph. The Pulling algorithm is applied to find a shortest path, which determines the string similarity value. SPED was superior for one of the data sets, with a precision of 0.6.
统一医学语言系统(UMLS)的扩展是一项重要的长期研究项目。本文提出最短路径编辑距离(SPED)算法,用于改进现有的源整合和审核技术。我们将SPED用作UMLS术语的字符串相似性度量,这些术语已知是同义词,因为它们被分配到同一个概念。我们使用两个UMLS样本作为测试平台,将SPED与其他几种著名的字符串匹配算法进行比较。其中一个样本基于SNOMED。SPED将计算两个字符串之间编辑距离的任务转化为在节点和链接图中从源到目标找到最短路径的问题。在该算法中,使用两个字符串构建图。应用拉取算法找到最短路径,该路径确定字符串相似性值。对于其中一个数据集,SPED表现更优,精度为0.6。