Humphrey Susanne M
Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, MD 20894.
J Am Soc Inf Sci. 1999;50(8):661-674. doi: 10.1002/(SICI)1097-4571(1999)50:8<661::AID-ASI4>3.0.CO;2-R.
A new, fully automated approach for indexing documents is presented based on associating textwords in a training set of bibliographic citations with the indexing of journals. This journal-level indexing is in the form of a consistent, timely set of journal descriptors (JDs) indexing the individual journals themselves. This indexing is maintained in journal records in a serials authority database. The advantage of this novel approach is that the training set does not depend on previous manual indexing of hundreds of thousands of documents (i.e., any such indexing already in the training set is not used), but rather the relatively small intellectual effort of indexing at the journal level, usually a matter of a few thousand unique journals for which retrospective indexing to maintain consistency and currency may be feasible. If successful, JD indexing would provide topical categorization of documents outside the training set, i.e., journal articles, monographs, WEB documents, reports from the grey literature, etc., and therefore be applied in searching. Because JDs are quite general, corresponding to subject domains, their most probable use would be for improving or refining search results.
本文提出了一种全新的文档索引自动化方法,该方法基于将文献目录引用训练集中的文本词与期刊索引相关联。这种期刊级索引采用了一组一致、及时的期刊描述符(JD)形式,用于对各个期刊本身进行索引。该索引保存在连续出版物权威数据库中的期刊记录中。这种新颖方法的优点在于,训练集不依赖于先前对数以十万计文档的手动索引(即训练集中已有的任何此类索引均不使用),而是依赖于在期刊级别进行索引所需的相对较少的智力投入,通常只需对几千种独特期刊进行索引,对这些期刊进行追溯索引以保持一致性和时效性或许是可行的。如果成功,JD索引将为训练集之外的文档提供主题分类,即期刊文章、专著、网络文档、灰色文献报告等,因此可应用于搜索。由于JD相当通用,对应于主题领域,其最可能的用途将是改进或优化搜索结果。