Suppr超能文献

基于规则的系统与统计系统在根据生物医学专业对MEDLINE文档进行自动分类方面的比较。

Comparing a Rule Based vs. Statistical System for Automatic Categorization of MEDLINE Documents According to Biomedical Specialty.

作者信息

Humphrey Susanne M, Névéol Aurélie, Gobeil Julien, Ruch Patrick, Darmoni Stéfan J, Browne Allen

机构信息

U.S. National Library of Medicine, National Institutes of Health 8600 Rockville Pike, Bethesda, MD 20894, USA Tel: +1 (301)435-9026.

出版信息

J Am Soc Inf Sci Technol. 2009 Dec 1;60(12):2530-2539. doi: 10.1002/asi.21170.

Abstract

Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings(®) (MeSH(®)) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI) based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for one hundred MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures, performance is comparable, and for one measure, JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule based) might be combined and then evaluated showing they are complementary to one another.

摘要

自动文档分类是信息科学和自然语言处理中的一个重要研究问题。许多应用,包括词义消歧和大型文献库中的信息检索,都可以从这种分类中受益。本文重点关注将生物医学文献中的文档自动分类到基于宽泛学科的类别中。文中描述并对比了两种不同的系统:CISMeF,它使用基于医学主题词表(®)(MeSH(®))控制词汇对文档进行人工标引的规则来分配元术语(MTs);以及基于对约4000种期刊进行人工分类以及期刊描述符(JDs)与文档中的文本词之间的统计关联的期刊描述符索引(JDI)。我们使用从trec_eval中选取的六种度量标准,针对一百篇MEDLINE文档的人工分配类别的黄金标准,评估并比较了这些系统的性能。结果表明,对于其中五种度量标准,性能相当,而对于一种度量标准,JDI更优。我们得出结论,鉴于人工标引以及维护将MeSH术语映射到MTs的规则库涉及的智力开销显著更大,这些结果支持JDI。我们还注意到一种将JDs与MeSH标引而非文本词相关联的JDI方法,研究这种JDI方法(基于统计)和CISMeF(基于规则)是否可以结合起来然后评估它们是否相互补充可能是值得的。

相似文献

4
Automatic Indexing of Documents from Journal Descriptors: A Preliminary Investigation.基于期刊描述符的文档自动索引:初步调查
J Am Soc Inf Sci. 1999;50(8):661-674. doi: 10.1002/(SICI)1097-4571(1999)50:8<661::AID-ASI4>3.0.CO;2-R.
6
A MEDLINE categorization algorithm.一种医学文献数据库(MEDLINE)分类算法。
BMC Med Inform Decis Mak. 2006 Feb 7;6:7. doi: 10.1186/1472-6947-6-7.

本文引用的文献

1
Indexed Pain Journals.索引疼痛期刊。
J Pain Palliat Care Pharmacother. 2008;22(1):45-46. doi: 10.1080/15360280801989377.
2
Automatic Indexing of Documents from Journal Descriptors: A Preliminary Investigation.基于期刊描述符的文档自动索引:初步调查
J Am Soc Inf Sci. 1999;50(8):661-674. doi: 10.1002/(SICI)1097-4571(1999)50:8<661::AID-ASI4>3.0.CO;2-R.
4
A recent advance in the automatic indexing of the biomedical literature.生物医学文献自动标引的最新进展。
J Biomed Inform. 2009 Oct;42(5):814-23. doi: 10.1016/j.jbi.2008.12.007. Epub 2008 Dec 30.
5
Automatic medical encoding with SNOMED categories.使用SNOMED分类进行自动医学编码。
BMC Med Inform Decis Mak. 2008 Oct 27;8 Suppl 1(Suppl 1):S6. doi: 10.1186/1472-6947-8-S1-S6.
10
A MEDLINE categorization algorithm.一种医学文献数据库(MEDLINE)分类算法。
BMC Med Inform Decis Mak. 2006 Feb 7;6:7. doi: 10.1186/1472-6947-6-7.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验