University of Missouri Informatics Institute, Columbia, MO 65211, USA.
BMC Med Inform Decis Mak. 2013 Jan 9;13:8. doi: 10.1186/1472-6947-13-8.
The practice of evidence-based medicine requires efficient biomedical literature search such as PubMed/MEDLINE. Retrieval performance relies highly on the efficient use of search field tags. The purpose of this study was to analyze PubMed log data in order to understand the usage pattern of search tags by the end user in PubMed/MEDLINE search.
A PubMed query log file was obtained from the National Library of Medicine containing anonymous user identification, timestamp, and query text. Inconsistent records were removed from the dataset and the search tags were extracted from the query texts. A total of 2,917,159 queries were selected for this study issued by a total of 613,061 users. The analysis of frequent co-occurrences and usage patterns of the search tags was conducted using an association mining algorithm.
The percentage of search tag usage was low (11.38% of the total queries) and only 2.95% of queries contained two or more tags. Three out of four users used no search tag and about two-third of them issued less than four queries. Among the queries containing at least one tagged search term, the average number of search tags was almost half of the number of total search terms. Navigational search tags are more frequently used than informational search tags. While no strong association was observed between informational and navigational tags, six (out of 19) informational tags and six (out of 29) navigational tags showed strong associations in PubMed searches.
The low percentage of search tag usage implies that PubMed/MEDLINE users do not utilize the features of PubMed/MEDLINE widely or they are not aware of such features or solely depend on the high recall focused query translation by the PubMed's Automatic Term Mapping. The users need further education and interactive search application for effective use of the search tags in order to fulfill their biomedical information needs from PubMed/MEDLINE.
循证医学实践需要高效的生物医学文献搜索,例如 PubMed/MEDLINE。检索性能高度依赖于搜索字段标签的有效使用。本研究旨在分析 PubMed 日志数据,以了解最终用户在 PubMed/MEDLINE 搜索中使用搜索标签的模式。
从国家医学图书馆获得包含匿名用户标识、时间戳和查询文本的 PubMed 查询日志文件。从数据集删除不一致的记录,并从查询文本中提取搜索标签。本研究共选择了 613,061 名用户发出的 2,917,159 个查询。使用关联挖掘算法分析搜索标签的频繁共现和使用模式。
搜索标签的使用率较低(占总查询的 11.38%),只有 2.95%的查询包含两个或更多标签。四分之三的用户不使用搜索标签,其中约三分之二的用户发出的查询少于四个。在包含至少一个标记搜索词的查询中,平均搜索标签数几乎是总搜索词数的一半。导航搜索标签比信息搜索标签更频繁使用。虽然在信息和导航标签之间没有观察到很强的关联,但在 PubMed 搜索中,六个(19 个中的六个)信息标签和六个(29 个中的六个)导航标签显示出很强的关联。
搜索标签使用率较低意味着 PubMed/MEDLINE 用户没有广泛利用 PubMed/MEDLINE 的功能,或者他们不知道这些功能,或者仅仅依赖 PubMed 的自动术语映射来实现高召回率的查询翻译。为了从 PubMed/MEDLINE 满足他们的生物医学信息需求,用户需要进一步的教育和交互式搜索应用程序,以有效使用搜索标签。