使用支持向量机对生物医学文献分类的研究。

Investigation into biomedical literature classification using support vector machines.

作者信息

Polavarapu Nalini, Navathe Shamkant B, Ramnarayanan Ramprasad, ul Haque Abrar, Sahay Saurav, Liu Ying

机构信息

School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.

出版信息

Proc IEEE Comput Syst Bioinform Conf. 2005:366-74. doi: 10.1109/csb.2005.36.

DOI:10.1109/csb.2005.36

PMID:16447994

Abstract

Specific topic search in the PubMed Database, one of the most important information resources for scientific community, presents a big challenge to the users. The researcher typically formulates boolean queries followed by scanning the retrieved records for relevance, which is very time consuming and error prone. We applied Support Vector Machines (SVM) for automatic retrieval of PubMed articles related to Human genome epidemiological research at CDC (Center for disease Control and Prevention). In this paper, we discuss various investigations into biomedical literature classification and analyze the effect of various issues related to the choice of keywords, training sets, kernel functions and parameters for the SVM technique. We report on the various factors above to show that SVM is a viable technique for automatic classification of biomedical literature into topics of interest such as epidemiology, cancer, birth defects etc. In all our experiments, we achieved high values of PPV, sensitivity and specificity.

摘要

在科学界最重要的信息资源之一——PubMed数据库中进行特定主题搜索，对用户来说是一项巨大挑战。研究人员通常先制定布尔查询，然后逐一浏览检索到的记录以判断相关性，这既耗时又容易出错。我们将支持向量机（SVM）应用于自动检索与美国疾病控制与预防中心（CDC）的人类基因组流行病学研究相关的PubMed文章。在本文中，我们探讨了对生物医学文献分类的各种研究，并分析了与支持向量机技术中关键词选择、训练集、核函数和参数等各种问题相关的影响。我们报告上述各种因素，以表明支持向量机是一种将生物医学文献自动分类到流行病学、癌症、出生缺陷等感兴趣主题的可行技术。在我们所有的实验中，我们都获得了较高的阳性预测值、灵敏度和特异性。

相似文献

Investigation into biomedical literature classification using support vector machines.使用支持向量机对生物医学文献分类的研究。

Proc IEEE Comput Syst Bioinform Conf. 2005:366-74. doi: 10.1109/csb.2005.36.

A survey of current work in biomedical text mining.生物医学文本挖掘的当前工作调查。

Brief Bioinform. 2005 Mar;6(1):57-71. doi: 10.1093/bib/6.1.57.

Hairpins in bookstacks: information retrieval from biomedical text.书库中的发夹：从生物医学文本中检索信息

Brief Bioinform. 2005 Sep;6(3):222-38. doi: 10.1093/bib/6.3.222.

Protein annotation by EBIMed.通过EBIMed进行蛋白质注释。

Nat Biotechnol. 2006 Aug;24(8):902-3. doi: 10.1038/nbt0806-902.

Bioinformatics. 2006 Sep 15;22(18):2298-304. doi: 10.1093/bioinformatics/btl388. Epub 2006 Aug 22.

Recognizing names in biomedical texts: a machine learning approach.识别生物医学文本中的名称：一种机器学习方法。

Bioinformatics. 2004 May 1;20(7):1178-90. doi: 10.1093/bioinformatics/bth060. Epub 2004 Feb 10.

Substring selection for biomedical document classification.用于生物医学文档分类的子串选择

Bioinformatics. 2006 Sep 1;22(17):2136-42. doi: 10.1093/bioinformatics/btl350. Epub 2006 Jul 12.

PubSearch and PubFetch: a simple management system for semiautomated retrieval and annotation of biological information from the literature.PubSearch和PubFetch：一种用于从文献中半自动检索和注释生物信息的简单管理系统。

Curr Protoc Bioinformatics. 2006 Mar;Chapter 9:Unit9.7. doi: 10.1002/0471250953.bi0907s13.

Evaluating relevance ranking strategies for MEDLINE retrieval.评估用于MEDLINE检索的相关性排序策略。

AMIA Annu Symp Proc. 2008 Nov 6;2008:439.

Discovering patterns to extract protein-protein interactions from the literature: Part II.从文献中发现用于提取蛋白质-蛋白质相互作用的模式：第二部分。

Bioinformatics. 2005 Aug 1;21(15):3294-300. doi: 10.1093/bioinformatics/bti493. Epub 2005 May 12.

引用本文的文献

Extracting the latent needs of dementia patients and caregivers from transcribed interviews in japanese: an initial assessment of the availability of morpheme selection as input data with Z-scores in machine learning.从转录的日语文本采访中提取痴呆症患者和护理人员的潜在需求：使用机器学习中的 Z 分数评估作为输入数据的词素选择的可用性的初步评估。

BMC Med Inform Decis Mak. 2023 Oct 5;23(1):203. doi: 10.1186/s12911-023-02303-3.

Recent advances in biomedical literature mining.生物医学文献挖掘的最新进展。

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa057.

Representing and extracting lung cancer study metadata: study objective and study design.呈现和提取肺癌研究元数据：研究目的与研究设计。

Comput Biol Med. 2015 Mar;58:63-72. doi: 10.1016/j.compbiomed.2015.01.004. Epub 2015 Jan 13.

Caipirini: using gene sets to rank literature.卡皮尔尼：使用基因集对文献进行排名。

BioData Min. 2012 Feb 1;5(1):1. doi: 10.1186/1756-0381-5-1.

Automating curation using a natural language processing pipeline.使用自然语言处理流程实现编目自动化。

Genome Biol. 2008;9 Suppl 2(Suppl 2):S10. doi: 10.1186/gb-2008-9-s2-s10. Epub 2008 Sep 1.

GAPscreener: an automatic tool for screening human genetic association literature in PubMed using the support vector machine technique.GAP筛选器：一种利用支持向量机技术在PubMed中筛选人类基因关联文献的自动工具。

BMC Bioinformatics. 2008 Apr 22;9:205. doi: 10.1186/1471-2105-9-205.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用支持向量机对生物医学文献分类的研究。

Investigation into biomedical literature classification using support vector machines.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献