Crowell Jonathan, Zeng Qing, Ngo Long, Lacroix Eve-Marie
Decision Systems Group, Brigham & Women's Hospital, Harvard Medical School, Boston, MA 02115, USA.
J Am Med Inform Assoc. 2004 May-Jun;11(3):179-85. doi: 10.1197/jamia.M1474. Epub 2004 Feb 5.
There is an abundance of health-related information online, and millions of consumers search for such information. Spell checking is of crucial importance in returning pertinent results, so the authors propose a technique for increasing the effectiveness of spell-checking tools used for health-related information retrieval.
A sample of incorrectly spelled medical terms was submitted to two different spell-checking tools, and the resulting suggestions, derived under two different dictionary configurations, were re-sorted according to how frequently each term appeared in log data from a medical search engine.
Univariable analysis was carried out to assess the effect of each factor (spell-checking tool, dictionary type, re-sort, or no re-sort) on the probability of success. The factors that were statistically significant in the univariable analysis were then used in multivariable analysis to evaluate the independent effect of each of the factors.
The re-sorted suggestions proved to be significantly more accurate than the original list returned by the spell-checking tool. The odds of finding the correct suggestion in the number one rank were increased by 63% after re-sorting using the authors' method. This effect was independent of both the dictionary and the spell-checking tools that were used.
Using knowledge about the frequency of a given word's occurrence in the medical domain can significantly improve spelling correction for medical queries.
网上有大量与健康相关的信息,数百万消费者搜索此类信息。拼写检查对于返回相关结果至关重要,因此作者提出一种技术,以提高用于健康相关信息检索的拼写检查工具的有效性。
将一组拼写错误的医学术语样本提交给两种不同的拼写检查工具,并根据每个术语在医学搜索引擎日志数据中出现的频率,对在两种不同词典配置下得出的结果建议进行重新排序。
进行单变量分析,以评估每个因素(拼写检查工具、词典类型、重新排序或不重新排序)对成功概率的影响。然后,将单变量分析中具有统计学意义的因素用于多变量分析,以评估每个因素的独立影响。
重新排序后的建议被证明比拼写检查工具返回的原始列表准确得多。使用作者的方法重新排序后,在首位找到正确建议的几率提高了63%。这种效果与所使用的词典和拼写检查工具无关。
利用给定单词在医学领域出现频率的知识,可以显著改善医学查询的拼写校正。