Tanaka L Y, Herskovic J R, Iyengar M S, Bernstam E V
School of Health Information Sciences, The University of Texas Health Science Center at Houston, 7000 Fannin Street, Suite 600, Houston, TX 77030, USA.
J Biomed Inform. 2009 Aug;42(4):678-84. doi: 10.1016/j.jbi.2009.02.009. Epub 2009 Mar 9.
Information overload is a problem for users of MEDLINE, the database of biomedical literature that indexes over 17 million articles. Various techniques have been developed to retrieve high quality or important articles. Some techniques rely on using the number of citations as a measurement of an article's importance. Unfortunately, citation information is proprietary, expensive, and suffers from "citation lag." MEDLINE users have a variety of information needs. Although some users require high recall, many users are looking for a "few good articles" on a topic. For these users, precision is more important than recall. We present and evaluate a method for identifying articles likely to be highly cited by using information available at the time of listing in MEDLINE. The method uses a score based on Medical Subject Headings (MeSH) terms, journal impact factor (JIF), and number of authors. This method can filter large MEDLINE result sets (>1000 articles) returned by actual user queries to produce small, highly cited result sets.
信息过载是医学文献数据库MEDLINE用户面临的一个问题,该数据库索引了超过1700万篇文章。人们已经开发出各种技术来检索高质量或重要的文章。一些技术依赖于使用被引次数作为衡量文章重要性的指标。不幸的是,引文信息是专有的、昂贵的,并且存在“引文滞后”问题。MEDLINE用户有各种各样的信息需求。虽然一些用户需要高召回率,但许多用户正在寻找某一主题的“几篇好文章”。对于这些用户来说,精确率比召回率更重要。我们提出并评估一种利用MEDLINE收录时可用信息来识别可能被大量引用的文章的方法。该方法使用基于医学主题词(MeSH)、期刊影响因子(JIF)和作者数量的分数。这种方法可以对实际用户查询返回的大型MEDLINE结果集(>1000篇文章)进行筛选,以生成小型的、被大量引用的结果集。