Suppr超能文献

生物医学类别的自动分配:迈向通用方法

Automatic assignment of biomedical categories: toward a generic approach.

作者信息

Ruch Patrick

机构信息

University Hospitals of Geneva, Medical Informatics Service CH-1201, Geneva.

出版信息

Bioinformatics. 2006 Mar 15;22(6):658-64. doi: 10.1093/bioinformatics/bti783. Epub 2005 Nov 15.

Abstract

MOTIVATION

We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent.

METHODS

In order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units.

RESULTS AND CONCLUSION

Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to <20% for GO, establishing a new baseline for categorizers based on retrieval methods.

摘要

动机

我们报告了一个通用文本分类系统的开发,该系统旨在自动为任何输入文本分配生物医学类别。与通常依赖从大量训练数据中提取的数据密集型模型的自动文本分类系统不同,我们的分类器在很大程度上不依赖数据。

方法

为了评估我们方法的稳健性,我们在两种不同的生物医学术语上测试该系统:医学主题词表(MeSH)和基因本体(GO)。我们基于两个排序模块的轻量级分类器结合了模式匹配器和向量空间检索引擎,并使用词干和基于语言学的索引单元。

结果与结论

结果表明短语索引对GO和MeSH分类均有效,但我们观察到该工具的分类能力取决于受控词汇表:高排名的精确率范围从MeSH的90%以上到GO的不到20%,为基于检索方法的分类器建立了新的基线。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验