Suppr超能文献

一个用于医师名录的模糊匹配搜索引擎。

A fuzzy-match search engine for physician directories.

机构信息

Marshfield Clinic Research Foundation, Biomedical Informatics Research Center, Marshfield, WI, United States.

出版信息

JMIR Med Inform. 2014 Nov 4;2(2):e30. doi: 10.2196/medinform.3463.

Abstract

BACKGROUND

A search engine to find physicians' information is a basic but crucial function of a health care provider's website. Inefficient search engines, which return no results or incorrect results, can lead to patient frustration and potential customer loss. A search engine that can handle misspellings and spelling variations of names is needed, as the United States (US) has culturally, racially, and ethnically diverse names.

OBJECTIVE

The Marshfield Clinic website provides a search engine for users to search for physicians' names. The current search engine provides an auto-completion function, but it requires an exact match. We observed that 26% of all searches yielded no results. The goal was to design a fuzzy-match algorithm to aid users in finding physicians easier and faster.

METHODS

Instead of an exact match search, we used a fuzzy algorithm to find similar matches for searched terms. In the algorithm, we solved three types of search engine failures: "Typographic", "Phonetic spelling variation", and "Nickname". To solve these mismatches, we used a customized Levenshtein distance calculation that incorporated Soundex coding and a lookup table of nicknames derived from US census data.

RESULTS

Using the "Challenge Data Set of Marshfield Physician Names," we evaluated the accuracy of fuzzy-match engine-top ten (90%) and compared it with exact match (0%), Soundex (24%), Levenshtein distance (59%), and fuzzy-match engine-top one (71%).

CONCLUSIONS

We designed, created a reference implementation, and evaluated a fuzzy-match search engine for physician directories. The open-source code is available at the codeplex website and a reference implementation is available for demonstration at the datamarsh website.

摘要

背景

搜索引擎是医疗服务提供商网站的基本但至关重要的功能,用于查找医师信息。如果搜索引擎无法返回结果或返回错误结果,会导致患者不满并可能导致潜在客户流失。由于美国的姓名在文化、种族和民族方面具有多样性,因此需要一种能够处理拼写错误和拼写变体的搜索引擎。

目的

Marshfield 诊所的网站提供了一个搜索引擎,供用户搜索医师姓名。当前的搜索引擎提供自动补全功能,但需要完全匹配。我们观察到,26%的搜索结果都没有返回。目标是设计一种模糊匹配算法,以帮助用户更轻松、更快速地找到医师。

方法

我们使用模糊算法而不是精确匹配搜索来查找搜索词的相似匹配。在该算法中,我们解决了三种类型的搜索引擎故障:“打字错误”、“语音拼写变体”和“昵称”。为了解决这些不匹配的问题,我们使用了一种定制的 Levenshtein 距离计算方法,该方法结合了 Soundex 编码和从美国人口普查数据中派生的昵称查找表。

结果

使用“Marshfield 医师姓名挑战数据集”,我们评估了模糊匹配引擎前 10 名的准确性(90%),并将其与精确匹配(0%)、Soundex(24%)、Levenshtein 距离(59%)和模糊匹配引擎前 1 名(71%)进行了比较。

结论

我们设计、创建了一个参考实现,并对医师名录的模糊匹配搜索引擎进行了评估。该开源代码可在 codeplex 网站上获得,参考实现可在 datamarsh 网站上演示。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/844b/4288075/690c038bce0e/medinform_v2i2e30_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验