Suppr超能文献

统计术语强度分析及其在分子生物学文本索引和检索中的应用。

An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts.

作者信息

Wilbur W J, Yang Y

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

出版信息

Comput Biol Med. 1996 May;26(3):209-22. doi: 10.1016/0010-4825(95)00055-0.

Abstract

The biological literature presents a difficult challenge to information processing in its complexity, diversity, and in its sheer volume. Much of the diversity resides in its technical terminology, which has also become voluminous. In an effort to deal more effectively with this large vocabulary and improve information processing, a method of focus has been developed which allows one to classify terms based on a measure of their importance in describing the content of the documents in which they occur. The measurement is called the strength of a term and is a measure of how strongly the term's occurrences correlate with the subjects of documents in the database. If term occurrences are random then there will be no correlation and the strength will be zero, but if for any subject, the term is either always present or never present its strength will be one. We give here a new, information theoretical interpretation of term strength, review some of its uses in focusing the processing of documents for information retrieval and describe new results obtained in document categorization.

摘要

生物学文献在其复杂性、多样性以及庞大的数量方面,给信息处理带来了艰巨的挑战。其多样性很大程度上体现在技术术语上,这些术语也变得数量繁多。为了更有效地处理这个庞大的词汇表并改进信息处理,人们开发了一种聚焦方法,该方法允许根据术语在描述其出现的文档内容时的重要性度量对术语进行分类。这种度量称为术语强度,它衡量术语出现与数据库中文档主题的相关程度。如果术语出现是随机的,那么就不存在相关性,强度将为零,但如果对于任何主题,该术语要么总是出现要么从不出现,其强度将为一。我们在此给出术语强度的一种新的信息论解释,回顾其在聚焦文档处理以进行信息检索方面的一些用途,并描述在文档分类中获得的新结果。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验