• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

统计术语强度分析及其在分子生物学文本索引和检索中的应用。

An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts.

作者信息

Wilbur W J, Yang Y

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

出版信息

Comput Biol Med. 1996 May;26(3):209-22. doi: 10.1016/0010-4825(95)00055-0.

DOI:10.1016/0010-4825(95)00055-0
PMID:8725772
Abstract

The biological literature presents a difficult challenge to information processing in its complexity, diversity, and in its sheer volume. Much of the diversity resides in its technical terminology, which has also become voluminous. In an effort to deal more effectively with this large vocabulary and improve information processing, a method of focus has been developed which allows one to classify terms based on a measure of their importance in describing the content of the documents in which they occur. The measurement is called the strength of a term and is a measure of how strongly the term's occurrences correlate with the subjects of documents in the database. If term occurrences are random then there will be no correlation and the strength will be zero, but if for any subject, the term is either always present or never present its strength will be one. We give here a new, information theoretical interpretation of term strength, review some of its uses in focusing the processing of documents for information retrieval and describe new results obtained in document categorization.

摘要

生物学文献在其复杂性、多样性以及庞大的数量方面,给信息处理带来了艰巨的挑战。其多样性很大程度上体现在技术术语上,这些术语也变得数量繁多。为了更有效地处理这个庞大的词汇表并改进信息处理,人们开发了一种聚焦方法,该方法允许根据术语在描述其出现的文档内容时的重要性度量对术语进行分类。这种度量称为术语强度,它衡量术语出现与数据库中文档主题的相关程度。如果术语出现是随机的,那么就不存在相关性,强度将为零,但如果对于任何主题,该术语要么总是出现要么从不出现,其强度将为一。我们在此给出术语强度的一种新的信息论解释,回顾其在聚焦文档处理以进行信息检索方面的一些用途,并描述在文档分类中获得的新结果。

相似文献

1
An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts.统计术语强度分析及其在分子生物学文本索引和检索中的应用。
Comput Biol Med. 1996 May;26(3):209-22. doi: 10.1016/0010-4825(95)00055-0.
2
Words or concepts: the features of indexing units and their optimal use in information retrieval.词汇或概念:索引单元的特征及其在信息检索中的最佳应用。
Proc Annu Symp Comput Appl Med Care. 1993:685-9.
3
Ranking the whole MEDLINE database according to a large training set using text indexing.使用文本索引根据一个大型训练集对整个MEDLINE数据库进行排名。
BMC Bioinformatics. 2005 Mar 24;6:75. doi: 10.1186/1471-2105-6-75.
4
Selective dissemination and indexing of scientific information.科学信息的选择性传播与索引编制
Science. 1971 Jul 23;173(3994):300-8. doi: 10.1126/science.173.3994.300.
5
Words, concepts, or both: optimal indexing units for automated information retrieval.单词、概念或两者兼而有之:自动化信息检索的最佳索引单元。
Proc Annu Symp Comput Appl Med Care. 1992:644-8.
6
Automatic MeSH term assignment and quality assessment.自动医学主题词表术语分配与质量评估。
Proc AMIA Symp. 2001:319-23.
7
A MEDLINE categorization algorithm.一种医学文献数据库(MEDLINE)分类算法。
BMC Med Inform Decis Mak. 2006 Feb 7;6:7. doi: 10.1186/1472-6947-6-7.
8
An evaluation of statistical approaches to MEDLINE indexing.对医学在线数据库(MEDLINE)索引统计方法的评估。
Proc AMIA Annu Fall Symp. 1996:358-62.
9
An application of Expert Network to clinical classification and MEDLINE indexing.专家网络在临床分类及医学在线数据库索引中的应用。
Proc Annu Symp Comput Appl Med Care. 1994:157-61.
10
Creating and indexing teaching files from free-text patient reports.从自由文本患者报告中创建教学文件并建立索引。
Proc AMIA Symp. 1999:814-8.

引用本文的文献

1
Scientometric Study of Research in Information Retrieval in Medical Sciences.医学信息检索研究的科学计量学研究
Med J Islam Repub Iran. 2022 Jun 16;36:65. doi: 10.47176/mjiri.36.65. eCollection 2022.
2
Computational Methods for Identifying Similar Diseases.识别相似疾病的计算方法
Mol Ther Nucleic Acids. 2019 Dec 6;18:590-604. doi: 10.1016/j.omtn.2019.09.019. Epub 2019 Sep 28.
3
Modular organization of the human disease genes: a text-based network inference.人类疾病基因的模块化组织:基于文本的网络推断
Bioinformation. 2015 Sep 30;11(9):432-6. doi: 10.6026/97320630011432. eCollection 2015.
4
Using a search engine-based mutually reinforcing approach to assess the semantic relatedness of biomedical terms.使用基于搜索引擎的相互强化方法来评估生物医学术语的语义相关性。
PLoS One. 2013 Nov 13;8(11):e77868. doi: 10.1371/journal.pone.0077868. eCollection 2013.
5
CoIN: a network analysis for document triage.CoIN:一种用于文档分类的网络分析方法。
Database (Oxford). 2013 Nov 11;2013:bat076. doi: 10.1093/database/bat076. Print 2013.
6
Phosphoproteomics identifies oncogenic Ras signaling targets and their involvement in lung adenocarcinomas.磷酸化蛋白质组学鉴定致癌 Ras 信号靶标及其在肺腺癌中的作用。
PLoS One. 2011;6(5):e20199. doi: 10.1371/journal.pone.0020199. Epub 2011 May 26.
7
Towards a framework for developing semantic relatedness reference standards.迈向开发语义关联参照标准的框架。
J Biomed Inform. 2011 Apr;44(2):251-65. doi: 10.1016/j.jbi.2010.10.004. Epub 2010 Oct 31.
8
Natural Language Processing methods and systems for biomedical ontology learning.自然语言处理方法和系统在生物医学本体学习中的应用。
J Biomed Inform. 2011 Feb;44(1):163-79. doi: 10.1016/j.jbi.2010.07.006. Epub 2010 Jul 18.
9
Author Name Disambiguation in MEDLINE.医学在线数据库(MEDLINE)中的作者姓名消歧
ACM Trans Knowl Discov Data. 2009 Jul 1;3(3). doi: 10.1145/1552303.1552304.
10
Improving classification in protein structure databases using text mining.利用文本挖掘改进蛋白质结构数据库中的分类
BMC Bioinformatics. 2009 May 5;10:129. doi: 10.1186/1471-2105-10-129.