Chute C G, Yang Y
Department of Health Sciences Research, Mayo Foundation, Rochester, Minn., USA.
Methods Inf Med. 1995 Mar;34(1-2):104-10.
Statistical methods that can support text retrieval are becoming an increasing focus of medical informatics activities. We overview our adaptation of existing knowledge sources to create pseudo-documents for concept based latent semantic indexing. Experience demonstrated this tack of limited practical value, since retrieval performance was invariably unsatisfactory. We discovered this was due in part to the introduction of a vocabulary gap between the queries and the cases we sought to retrieve. In part to address this problem, and to avail our large body of humanly coded text as a knowledge source, we developed a least squares fit alternative for the computer assisted indexing and retrieval of biomedical texts. This technique demonstrates equivalent or superior retrieval performance when compared to all other textual retrieval techniques. It does not depend upon elaborate knowledge bases, lexicons, or thesauri. It is a promising technique for classifying and retrieving the large volumes of clinical text.
能够支持文本检索的统计方法正日益成为医学信息学活动的焦点。我们概述了如何改编现有知识源以创建用于基于概念的潜在语义索引的伪文档。经验表明,这种方法的实际价值有限,因为检索性能总是不尽人意。我们发现这部分是由于在查询与我们试图检索的病例之间引入了词汇差距。为了部分解决这个问题,并将我们大量人工编码的文本用作知识源,我们开发了一种最小二乘法拟合替代方法,用于生物医学文本的计算机辅助索引和检索。与所有其他文本检索技术相比,该技术展示出同等或更优的检索性能。它不依赖于复杂的知识库、词典或叙词表。它是一种用于对大量临床文本进行分类和检索的有前景的技术。