一种用于自动化生物医学文档分类的有效通用方法。

An effective general purpose approach for automated biomedical document classification.

作者信息

Cohen Aaron M

机构信息

Oregon Health & Science University, Portland, OR, USA.

出版信息

AMIA Annu Symp Proc. 2006;2006:161-5.

PMID:17238323

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1839342/

Abstract

Automated document classification can be a valuable tool for biomedical tasks that involve large amounts of text. However, in biomedicine, documents that have the desired properties are often rare, and special methods are usually required to address this issue. We propose and evaluate a method of classifying biomedical text documents, optimizing for utility when misclassification costs are highly asymmetric between the positive and negative classes. The method uses chi-square feature selection and several iterations of cost proportionate rejection sampling followed by application of a support vector machine (SVM), combining the resulting classifier results with voting. It is straightforward, fast, and achieves competitive performance on a set of standardized biomedical text classification evaluation tasks. The method is a good general purpose approach for classifying biomedical text.

摘要

自动文档分类对于涉及大量文本的生物医学任务而言可能是一种有价值的工具。然而，在生物医学领域，具有所需属性的文档往往很少见，通常需要特殊方法来解决这一问题。我们提出并评估了一种生物医学文本文档分类方法，当正类和负类之间的误分类成本高度不对称时，该方法针对效用进行了优化。该方法使用卡方特征选择和成本比例拒绝采样的多次迭代，随后应用支持向量机（SVM），并通过投票将所得分类器结果相结合。它简单、快速，并且在一组标准化生物医学文本分类评估任务中取得了有竞争力的性能。该方法是一种用于生物医学文本分类的良好通用方法。

相似文献

An effective general purpose approach for automated biomedical document classification.

AMIA Annu Symp Proc. 2006;2006:161-5.

Substring selection for biomedical document classification.

Bioinformatics. 2006 Sep 1;22(17):2136-42. doi: 10.1093/bioinformatics/btl350. Epub 2006 Jul 12.

Recognizing names in biomedical texts: a machine learning approach.

Bioinformatics. 2004 May 1;20(7):1178-90. doi: 10.1093/bioinformatics/bth060. Epub 2004 Feb 10.

Reflective random indexing for semi-automatic indexing of the biomedical literature.

J Biomed Inform. 2010 Oct;43(5):694-700. doi: 10.1016/j.jbi.2010.04.001. Epub 2010 Apr 9.

An automatic method for retrieving and indexing catalogues of biomedical courses.

AMIA Annu Symp Proc. 2008 Nov 6:922.

The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text.

J Biomed Inform. 2003 Dec;36(6):462-77. doi: 10.1016/j.jbi.2003.11.003.

Automatic extraction of candidate nomenclature terms using the doublet method.

BMC Med Inform Decis Mak. 2005 Oct 18;5:35. doi: 10.1186/1472-6947-5-35.

A method for verifying a vector-based text classification system.

AMIA Annu Symp Proc. 2008 Nov 6:1030.

Large scale biomedical texts classification: a kNN and an ESA-based approaches.

J Biomed Semantics. 2016 Jun 16;7:40. doi: 10.1186/s13326-016-0073-1.

Extracting drug-drug interaction articles from MEDLINE to improve the content of drug databases.

AMIA Annu Symp Proc. 2005;2005:216-20.

引用本文的文献

Transformer-Based Language Models for Group Randomized Trial Classification in Biomedical Literature: Model Development and Validation.

JMIR Med Inform. 2025 May 9;13:e63267. doi: 10.2196/63267.

Accessing the Climate Change Impacts in China through a Literature Mapping.

Int J Environ Res Public Health. 2022 Oct 17;19(20):13411. doi: 10.3390/ijerph192013411.

Recurrent Neural Networks to Automatically Identify Rare Disease Epidemiologic Studies from PubMed.

AMIA Jt Summits Transl Sci Proc. 2021 May 17;2021:325-334. eCollection 2021.

Biomedical document triage using a hierarchical attention-based capsule network.

BMC Bioinformatics. 2020 Sep 17;21(Suppl 13):380. doi: 10.1186/s12859-020-03673-5.

Integrating image caption information into biomedical document classification in support of biocuration.

Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa024.

SWIFT-Active Screener: Accelerated document screening through active learning and integrated recall estimation.

Environ Int. 2020 May;138:105623. doi: 10.1016/j.envint.2020.105623. Epub 2020 Mar 20.

Improving reference prioritisation with PICO recognition.

BMC Med Inform Decis Mak. 2019 Dec 5;19(1):256. doi: 10.1186/s12911-019-0992-8.

Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine.

J Am Med Inform Assoc. 2015 May;22(3):707-17. doi: 10.1093/jamia/ocu025. Epub 2015 Feb 5.

Using text mining for study identification in systematic reviews: a systematic review of current approaches.

Syst Rev. 2015 Jan 14;4(1):5. doi: 10.1186/2046-4053-4-5.

The role of the electronic medical record in the assessment of health related quality of life.

AMIA Annu Symp Proc. 2011;2011:1080-8. Epub 2011 Oct 22.

本文引用的文献

Reducing workload in systematic review preparation using automated citation classification.

J Am Med Inform Assoc. 2006 Mar-Apr;13(2):206-19. doi: 10.1197/jamia.M1929. Epub 2005 Dec 15.

Text categorization models for high-quality article retrieval in internal medicine.

J Am Med Inform Assoc. 2005 Mar-Apr;12(2):207-16. doi: 10.1197/jamia.M1641. Epub 2004 Nov 23.

Protein names precisely peeled off free text.

Bioinformatics. 2004 Aug 4;20 Suppl 1:i241-7. doi: 10.1093/bioinformatics/bth904.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于自动化生物医学文档分类的有效通用方法。

An effective general purpose approach for automated biomedical document classification.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献