Suppr超能文献

用于增强UMLS集成与审核的最短路径编辑距离

Shortest Path Edit Distance for Enhancing UMLS Integration and Audit.

作者信息

Rudniy Alex, Geller James, Song Min

机构信息

NJIT, Newark, NJ.

出版信息

AMIA Annu Symp Proc. 2010 Nov 13;2010:697-701.

Abstract

Expansion of the UMLS is an important long-term research project. This paper proposes Shortest Path Edit Distance (SPED) as an algorithm for improving existing source-integration and auditing techniques. We use SPED as a string similarity measure for UMLS terms that are known to be synonyms because they are assigned to the same concept. We compare SPED with several other well known string matching algorithms using two UMLS samples as test bed. One of those samples is SNOMED-based. SPED transforms the task of calculating edit distance among two strings into a problem of finding a shortest path from a source to a destination in a node and link graph. In the algorithm, the two strings are used to construct the graph. The Pulling algorithm is applied to find a shortest path, which determines the string similarity value. SPED was superior for one of the data sets, with a precision of 0.6.

摘要

统一医学语言系统(UMLS)的扩展是一项重要的长期研究项目。本文提出最短路径编辑距离(SPED)算法,用于改进现有的源整合和审核技术。我们将SPED用作UMLS术语的字符串相似性度量,这些术语已知是同义词,因为它们被分配到同一个概念。我们使用两个UMLS样本作为测试平台,将SPED与其他几种著名的字符串匹配算法进行比较。其中一个样本基于SNOMED。SPED将计算两个字符串之间编辑距离的任务转化为在节点和链接图中从源到目标找到最短路径的问题。在该算法中,使用两个字符串构建图。应用拉取算法找到最短路径,该路径确定字符串相似性值。对于其中一个数据集,SPED表现更优,精度为0.6。

相似文献

1
Shortest Path Edit Distance for Enhancing UMLS Integration and Audit.
AMIA Annu Symp Proc. 2010 Nov 13;2010:697-701.
2
Detecting duplicate biological entities using Shortest Path Edit Distance.
Int J Data Min Bioinform. 2010;4(4):395-410. doi: 10.1504/ijdmb.2010.034196.
4
Auditing SNOMED Integration into the UMLS for Duplicate Concepts.
AMIA Annu Symp Proc. 2010 Nov 13;2010:321-5.
5
6
Mapping biological entities using the longest approximately common prefix method.
BMC Bioinformatics. 2014 Jun 14;15:187. doi: 10.1186/1471-2105-15-187.
7
Mining cross-terminology links in the UMLS.
AMIA Annu Symp Proc. 2006;2006:624-8.
8
Graph edit distance from spectral seriation.
IEEE Trans Pattern Anal Mach Intell. 2005 Mar;27(3):365-378. doi: 10.1109/TPAMI.2005.56.
10
Improved algorithms for approximate string matching (extended abstract).
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S10. doi: 10.1186/1471-2105-10-S1-S10.

引用本文的文献

1
Assessing the enrichment of dietary supplement coverage in the Unified Medical Language System.
J Am Med Inform Assoc. 2020 Oct 1;27(10):1547-1555. doi: 10.1093/jamia/ocaa128.
2
A review of auditing techniques for the Unified Medical Language System.
J Am Med Inform Assoc. 2020 Oct 1;27(10):1625-1638. doi: 10.1093/jamia/ocaa108.
3
Mapping biological entities using the longest approximately common prefix method.
BMC Bioinformatics. 2014 Jun 14;15:187. doi: 10.1186/1471-2105-15-187.

本文引用的文献

1
2
A UMLS-based spell checker for natural language processing in vaccine safety.
BMC Med Inform Decis Mak. 2007 Feb 12;7:3. doi: 10.1186/1472-6947-7-3.
3
A flexible measure of contextual similarity for biomedical terms.
Pac Symp Biocomput. 2005:197-208. doi: 10.1142/9789812702456_0019.
6
Transformation distances: a family of dissimilarity measures based on movements of segments.
Bioinformatics. 1999 Mar;15(3):194-202. doi: 10.1093/bioinformatics/15.3.194.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验