Wilbur W John, Kim Won, Xie Natalie
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, U.S.A.
Inf Retr Boston. 2006 Nov;9(5):543-564. doi: 10.1007/s10791-006-9002-8.
It is known that users of internet search engines often enter queries with misspellings in one or more search terms. Several web search engines make suggestions for correcting misspelled words, but the methods used are proprietary and unpublished to our knowledge. Here we describe the methodology we have developed to perform spelling correction for the PubMed search engine. Our approach is based on the noisy channel model for spelling correction and makes use of statistics harvested from user logs to estimate the probabilities of different types of edits that lead to misspellings. The unique problems encountered in correcting search engine queries are discussed and our solutions are outlined.
众所周知,互联网搜索引擎的用户在输入一个或多个搜索词时常常会出现拼写错误。有几个网络搜索引擎会给出纠正拼写错误单词的建议,但据我们所知,所使用的方法是专有的且未公开。在这里,我们描述了我们为PubMed搜索引擎开发的执行拼写纠错的方法。我们的方法基于用于拼写纠错的噪声信道模型,并利用从用户日志中收集的统计数据来估计导致拼写错误的不同类型编辑的概率。我们讨论了在纠正搜索引擎查询时遇到的独特问题,并概述了我们的解决方案。