Lu Zhiyong, Kim Won, Wilbur W John
NCBI/NLM/NIH, 8600 Rockville Pike, Bethesda, MD 20852, USA.
J Am Med Inform Assoc. 2009 Jan-Feb;16(1):32-6. doi: 10.1197/jamia.M2935. Epub 2008 Oct 24.
This paper evaluates the retrieval effectiveness of relevance ranking strategies on a collection of 55 queries and about 160,000 MEDLINE((R)) citations used in the 2006 and 2007 Text Retrieval Conference (TREC) Genomics Tracks. The authors study two relevance ranking strategies: term frequency-inverse document frequency (TF-IDF) weighting and sentence-level co-occurrence, and examine their ability to rank retrieved MEDLINE documents given user queries. Furthermore, the authors use the reverse chronological order-PubMed's default display option-as a baseline for comparison. Retrieval effectiveness is assessed using both mean average precision and mean rank precision. Experimental results show that retrievals based on the two strategies had improved performance over the baseline performance, and that TF-IDF weighting is more effective in retrieving relevant documents based on the comparison between the two strategies.
本文评估了相关性排序策略在2006年和2007年文本检索会议(TREC)基因组学赛道中使用的55个查询和大约160,000篇MEDLINE((R))文献集合上的检索效果。作者研究了两种相关性排序策略:词频-逆文档频率(TF-IDF)加权和句子级共现,并检验了它们在给定用户查询时对检索到的MEDLINE文档进行排序的能力。此外,作者使用逆时间顺序——PubMed的默认显示选项——作为比较的基线。使用平均准确率和平均排名准确率来评估检索效果。实验结果表明,基于这两种策略的检索在基线性能上有了改进,并且基于两种策略之间的比较,TF-IDF加权在检索相关文档方面更有效。