生物医学类别的自动分配：迈向通用方法

Automatic assignment of biomedical categories: toward a generic approach.

作者信息

Ruch Patrick

机构信息

University Hospitals of Geneva, Medical Informatics Service CH-1201, Geneva.

出版信息

Bioinformatics. 2006 Mar 15;22(6):658-64. doi: 10.1093/bioinformatics/bti783. Epub 2005 Nov 15.

DOI:10.1093/bioinformatics/bti783

PMID:16287934

Abstract

MOTIVATION

We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent.

METHODS

In order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units.

RESULTS AND CONCLUSION

Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to <20% for GO, establishing a new baseline for categorizers based on retrieval methods.

摘要

动机

我们报告了一个通用文本分类系统的开发，该系统旨在自动为任何输入文本分配生物医学类别。与通常依赖从大量训练数据中提取的数据密集型模型的自动文本分类系统不同，我们的分类器在很大程度上不依赖数据。

方法

为了评估我们方法的稳健性，我们在两种不同的生物医学术语上测试该系统：医学主题词表（MeSH）和基因本体（GO）。我们基于两个排序模块的轻量级分类器结合了模式匹配器和向量空间检索引擎，并使用词干和基于语言学的索引单元。

结果与结论

结果表明短语索引对GO和MeSH分类均有效，但我们观察到该工具的分类能力取决于受控词汇表：高排名的精确率范围从MeSH的90%以上到GO的不到20%，为基于检索方法的分类器建立了新的基线。

相似文献

Automatic assignment of biomedical categories: toward a generic approach.

Bioinformatics. 2006 Mar 15;22(6):658-64. doi: 10.1093/bioinformatics/bti783. Epub 2005 Nov 15.

Using discourse analysis to improve text categorization in MEDLINE.

Stud Health Technol Inform. 2007;129(Pt 1):710-5.

Exploring supervised and unsupervised methods to detect topics in biomedical text.

BMC Bioinformatics. 2006 Mar 16;7:140. doi: 10.1186/1471-2105-7-140.

Automatic term list generation for entity tagging.

Bioinformatics. 2006 Mar 15;22(6):651-7. doi: 10.1093/bioinformatics/bti733. Epub 2005 Oct 25.

Automatic extension of Gene Ontology with flexible identification of candidate terms.

Bioinformatics. 2006 Mar 15;22(6):665-70. doi: 10.1093/bioinformatics/btl010. Epub 2006 Jan 21.

Bioinformatics. 2006 Sep 15;22(18):2298-304. doi: 10.1093/bioinformatics/btl388. Epub 2006 Aug 22.

Font adaptive word indexing of modern printed documents.

IEEE Trans Pattern Anal Mach Intell. 2006 Aug;28(8):1187-99. doi: 10.1109/TPAMI.2006.162.

Automatic extraction of gene/protein biological functions from biomedical text.

Bioinformatics. 2005 Apr 1;21(7):1227-36. doi: 10.1093/bioinformatics/bti084. Epub 2004 Oct 27.

Discovering patterns to extract protein-protein interactions from the literature: Part II.

Bioinformatics. 2005 Aug 1;21(15):3294-300. doi: 10.1093/bioinformatics/bti493. Epub 2005 May 12.

GeneInfoMiner--a web server for exploring biomedical literature using batch sequence ID.

Bioinformatics. 2005 Aug 15;21(16):3452-3. doi: 10.1093/bioinformatics/bti559. Epub 2005 Jun 30.

引用本文的文献

Biomedical Text Classification Using Augmented Word Representation Based on Distributional and Relational Contexts.

Comput Intell Neurosci. 2023 Feb 15;2023:2989791. doi: 10.1155/2023/2989791. eCollection 2023.

Learning From eHealth Implementations Through "Implementomics": A Multidimensional Annotation Model Applied to eHealth Projects of the RAFT Network.

Front Public Health. 2019 Jul 5;7:188. doi: 10.3389/fpubh.2019.00188. eCollection 2019.

Beyond opinion classification: Extracting facts, opinions and experiences from health forums.

PLoS One. 2019 Jan 9;14(1):e0209961. doi: 10.1371/journal.pone.0209961. eCollection 2019.

Predicting MeSH Beyond MEDLINE.

Proc 1st Workshop Sch Web Min (2017). 2017 Feb;2017:49-56. doi: 10.1145/3057148.3057155.

MeSH Now: automatic MeSH indexing at PubMed scale via learning to rank.

J Biomed Semantics. 2017 Apr 17;8(1):15. doi: 10.1186/s13326-017-0123-3.

Gene Ontology synonym generation rules lead to increased performance in biomedical concept recognition.

J Biomed Semantics. 2016 Sep 9;7(1):52. doi: 10.1186/s13326-016-0096-7.

Large scale biomedical texts classification: a kNN and an ESA-based approaches.

J Biomed Semantics. 2016 Jun 16;7:40. doi: 10.1186/s13326-016-0073-1.

Deep Question Answering for protein annotation.

Database (Oxford). 2015 Sep 16;2015. doi: 10.1093/database/bav081. Print 2015.

MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence.

Bioinformatics. 2015 Jun 15;31(12):i339-47. doi: 10.1093/bioinformatics/btv237.

Analyzing Medical Image Search Behavior: Semantics and Prediction of Query Results.

J Digit Imaging. 2015 Oct;28(5):537-46. doi: 10.1007/s10278-015-9792-6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

生物医学类别的自动分配：迈向通用方法

Automatic assignment of biomedical categories: toward a generic approach.

作者信息

机构信息

出版信息

MOTIVATION

METHODS

RESULTS AND CONCLUSION

动机

方法

结果与结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献