Hersh W R, Hickam D H, Leone T J
Biomedical Information Communication Center, Oregon Health Sciences University, Portland.
Proc Annu Symp Comput Appl Med Care. 1992:644-8.
What is the best way to represent the content of documents in an information retrieval system? This study compares the retrieval effectiveness of five different methods for automated (machine-assigned) indexing using three test collections. The consistently best methods are those that use indexing based on the words that occur in the available text of each document. Methods used to map text into concepts from a controlled vocabulary showed no advantage over the word-based methods. This study also looked at an approach to relevance feedback which showed benefit for both word-based and concept-based methods.
在信息检索系统中,呈现文档内容的最佳方式是什么?本研究使用三个测试集比较了五种不同的自动(机器分配)索引方法的检索效果。始终表现最佳的方法是那些基于每个文档可用文本中出现的单词进行索引的方法。用于将文本映射到来自受控词汇表的概念的方法与基于单词的方法相比没有优势。本研究还研究了一种相关反馈方法,该方法对基于单词的方法和基于概念的方法都有好处。