Lin Jimmy, Wilbur W John
The iSchool, College of Information Studies, University of Maryland, College Park, Maryland, USA,
Inf Retr Boston. 2008 Sep 12;12:487-503. doi: 10.1007/s10791-008-9067-7.
Transaction logs from online search engines are valuable for two reasons: First, they provide insight into human information-seeking behavior. Second, log data can be used to train user models, which can then be applied to improve retrieval systems. This article presents a study of logs from PubMed((R)), the public gateway to the MEDLINE((R)) database of bibliographic records from the medical and biomedical primary literature. Unlike most previous studies on general Web search, our work examines user activities with a highly-specialized search engine. We encode user actions as string sequences and model these sequences using n-gram language models. The models are evaluated in terms of perplexity and in a sequence prediction task. They help us better understand how PubMed users search for information and provide an enabler for improving users' search experience.
在线搜索引擎的交易日志之所以有价值,有两个原因:第一,它们能让我们深入了解人类的信息寻求行为。第二,日志数据可用于训练用户模型,然后将其应用于改进检索系统。本文介绍了一项对PubMed(R)日志的研究,PubMed是获取医学和生物医学原始文献书目记录的MEDLINE(R)数据库的公共网关。与之前大多数关于通用网络搜索的研究不同,我们的工作使用一个高度专业化的搜索引擎来检查用户活动。我们将用户操作编码为字符串序列,并使用n元语法语言模型对这些序列进行建模。这些模型通过困惑度和序列预测任务进行评估。它们帮助我们更好地理解PubMed用户如何搜索信息,并为改善用户的搜索体验提供了一种手段。