• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一个用于医师名录的模糊匹配搜索引擎。

A fuzzy-match search engine for physician directories.

机构信息

Marshfield Clinic Research Foundation, Biomedical Informatics Research Center, Marshfield, WI, United States.

出版信息

JMIR Med Inform. 2014 Nov 4;2(2):e30. doi: 10.2196/medinform.3463.

DOI:10.2196/medinform.3463
PMID:25601050
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4288075/
Abstract

BACKGROUND

A search engine to find physicians' information is a basic but crucial function of a health care provider's website. Inefficient search engines, which return no results or incorrect results, can lead to patient frustration and potential customer loss. A search engine that can handle misspellings and spelling variations of names is needed, as the United States (US) has culturally, racially, and ethnically diverse names.

OBJECTIVE

The Marshfield Clinic website provides a search engine for users to search for physicians' names. The current search engine provides an auto-completion function, but it requires an exact match. We observed that 26% of all searches yielded no results. The goal was to design a fuzzy-match algorithm to aid users in finding physicians easier and faster.

METHODS

Instead of an exact match search, we used a fuzzy algorithm to find similar matches for searched terms. In the algorithm, we solved three types of search engine failures: "Typographic", "Phonetic spelling variation", and "Nickname". To solve these mismatches, we used a customized Levenshtein distance calculation that incorporated Soundex coding and a lookup table of nicknames derived from US census data.

RESULTS

Using the "Challenge Data Set of Marshfield Physician Names," we evaluated the accuracy of fuzzy-match engine-top ten (90%) and compared it with exact match (0%), Soundex (24%), Levenshtein distance (59%), and fuzzy-match engine-top one (71%).

CONCLUSIONS

We designed, created a reference implementation, and evaluated a fuzzy-match search engine for physician directories. The open-source code is available at the codeplex website and a reference implementation is available for demonstration at the datamarsh website.

摘要

背景

搜索引擎是医疗服务提供商网站的基本但至关重要的功能,用于查找医师信息。如果搜索引擎无法返回结果或返回错误结果,会导致患者不满并可能导致潜在客户流失。由于美国的姓名在文化、种族和民族方面具有多样性,因此需要一种能够处理拼写错误和拼写变体的搜索引擎。

目的

Marshfield 诊所的网站提供了一个搜索引擎,供用户搜索医师姓名。当前的搜索引擎提供自动补全功能,但需要完全匹配。我们观察到,26%的搜索结果都没有返回。目标是设计一种模糊匹配算法,以帮助用户更轻松、更快速地找到医师。

方法

我们使用模糊算法而不是精确匹配搜索来查找搜索词的相似匹配。在该算法中,我们解决了三种类型的搜索引擎故障:“打字错误”、“语音拼写变体”和“昵称”。为了解决这些不匹配的问题,我们使用了一种定制的 Levenshtein 距离计算方法,该方法结合了 Soundex 编码和从美国人口普查数据中派生的昵称查找表。

结果

使用“Marshfield 医师姓名挑战数据集”,我们评估了模糊匹配引擎前 10 名的准确性(90%),并将其与精确匹配(0%)、Soundex(24%)、Levenshtein 距离(59%)和模糊匹配引擎前 1 名(71%)进行了比较。

结论

我们设计、创建了一个参考实现,并对医师名录的模糊匹配搜索引擎进行了评估。该开源代码可在 codeplex 网站上获得,参考实现可在 datamarsh 网站上演示。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/844b/4288075/690c038bce0e/medinform_v2i2e30_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/844b/4288075/690c038bce0e/medinform_v2i2e30_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/844b/4288075/690c038bce0e/medinform_v2i2e30_fig1.jpg

相似文献

1
A fuzzy-match search engine for physician directories.一个用于医师名录的模糊匹配搜索引擎。
JMIR Med Inform. 2014 Nov 4;2(2):e30. doi: 10.2196/medinform.3463.
2
Sources of traffic and visitors' preferences regarding online public reports of quality: web analytics and online survey results.流量来源以及访客对在线质量公开报告的偏好:网络分析与在线调查结果。
J Med Internet Res. 2015 May 1;17(5):e102. doi: 10.2196/jmir.3637.
3
Taxamatch, an algorithm for near ('fuzzy') matching of scientific names in taxonomic databases.Taxamatch,一种用于分类数据库中科学名称近(“模糊”)匹配的算法。
PLoS One. 2014 Sep 23;9(9):e107510. doi: 10.1371/journal.pone.0107510. eCollection 2014.
4
Collaborative Search Engine for Enhancing Personalized User Search Based on Domain Knowledge.基于领域知识增强个性化用户搜索的协作搜索引擎。
J Med Syst. 2019 Jun 23;43(8):243. doi: 10.1007/s10916-019-1350-1.
5
Variability of patient spine education by Internet search engine.通过互联网搜索引擎获取的患者脊柱教育的可变性。
Clin Neurol Neurosurg. 2014 Mar;118:59-64. doi: 10.1016/j.clineuro.2013.12.013. Epub 2014 Jan 4.
6
SPELLING CORRECTION IN THE PUBMED SEARCH ENGINE.PubMed搜索引擎中的拼写校正
Inf Retr Boston. 2006 Nov;9(5):543-564. doi: 10.1007/s10791-006-9002-8.
7
BIOMedical Search Engine Framework: Lightweight and customized implementation of domain-specific biomedical search engines.生物医学搜索引擎框架:特定领域生物医学搜索引擎的轻量级定制实现。
Comput Methods Programs Biomed. 2016 Jul;131:63-77. doi: 10.1016/j.cmpb.2016.03.030. Epub 2016 Apr 8.
8
Undergraduate Medical Students' Search for Health Information Online: Explanatory Cross-Sectional Study.本科医学生在线健康信息搜索:解释性横断面研究。
JMIR Med Inform. 2020 Mar 2;8(3):e16279. doi: 10.2196/16279.
9
A search tool based on language modelling developed for The Index of Middle English Prose.为《中古英语散文索引》开发的基于语言建模的搜索工具。
Open Res Eur. 2024 Mar 11;3:197. doi: 10.12688/openreseurope.16590.2. eCollection 2023.
10
Combining string and phonetic similarity matching to identify misspelt names of drugs in medical records written in Portuguese.结合字符串和语音相似度匹配来识别葡萄牙语书写的医疗记录中药物的拼写错误名称。
J Biomed Semantics. 2019 Nov 12;10(Suppl 1):17. doi: 10.1186/s13326-019-0216-2.

引用本文的文献

1
Identification of Gender Differences in Acute Myocardial Infarction Presentation and Management at Aga Khan University Hospital-Pakistan: Natural Language Processing Application in a Dataset of Patients With Cardiovascular Disease.巴基斯坦阿迦汗大学医院急性心肌梗死表现与治疗中的性别差异识别:心血管疾病患者数据集中的自然语言处理应用
JMIR Form Res. 2024 Dec 20;8:e42774. doi: 10.2196/42774.
2
Fast prototyping of a local fuzzy search system for decision support and retraining of hospital staff during pandemic.用于大流行期间决策支持及医院工作人员再培训的本地模糊搜索系统的快速原型设计。
Health Inf Sci Syst. 2021 May 11;9(1):21. doi: 10.1007/s13755-021-00150-y. eCollection 2021 Dec.
3

本文引用的文献

1
"I meant that med for Baylee not Bailey!": a mixed method study to identify incidence and risk factors for CPOE patient misidentification.“我指的是给贝莉的药,不是贝利!”:一项混合方法研究,旨在确定计算机化医嘱录入系统(CPOE)患者身份误认的发生率及风险因素。
AMIA Annu Symp Proc. 2012;2012:1294-301. Epub 2012 Nov 3.
2
Administrative simplification: adoption of a standard for a unique health plan identifier; addition to the National Provider Identifier requirements; and a change to the compliance date for the International Classification of Diseases, 10th Edition (ICD-10-CM and ICD-10-PCS) medical data code sets. Final rule.行政简化:采用唯一健康计划标识符标准;补充国家提供者标识符要求;以及更改《国际疾病分类》第10版(ICD - 10 - CM和ICD - 10 - PCS)医疗数据编码集的合规日期。最终规则。
Fed Regist. 2012 Sep 5;77(172):54663-720.
3
A Novel Information Retrieval Tool to Find Hospital Care Team Members: Development and Usability Study.
一种用于查找医院护理团队成员的新型信息检索工具:开发与可用性研究。
JMIR Hum Factors. 2018 Apr 16;5(2):e14. doi: 10.2196/humanfactors.6781.
An approximate matching method for clinical drug names.一种临床药物名称的近似匹配方法。
AMIA Annu Symp Proc. 2011;2011:1117-26. Epub 2011 Oct 22.
4
Assessment of approximate string matching in a biomedical text retrieval problem.生物医学文本检索问题中近似字符串匹配的评估
Comput Biol Med. 2005 Oct;35(8):717-24. doi: 10.1016/j.compbiomed.2004.06.002.
5
Real world performance of approximate string comparators for use in patient matching.用于患者匹配的近似字符串比较器的实际性能。
Stud Health Technol Inform. 2004;107(Pt 1):43-7.