Polavarapu Nalini, Navathe Shamkant B, Ramnarayanan Ramprasad, ul Haque Abrar, Sahay Saurav, Liu Ying
School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA.
Proc IEEE Comput Syst Bioinform Conf. 2005:366-74. doi: 10.1109/csb.2005.36.
Specific topic search in the PubMed Database, one of the most important information resources for scientific community, presents a big challenge to the users. The researcher typically formulates boolean queries followed by scanning the retrieved records for relevance, which is very time consuming and error prone. We applied Support Vector Machines (SVM) for automatic retrieval of PubMed articles related to Human genome epidemiological research at CDC (Center for disease Control and Prevention). In this paper, we discuss various investigations into biomedical literature classification and analyze the effect of various issues related to the choice of keywords, training sets, kernel functions and parameters for the SVM technique. We report on the various factors above to show that SVM is a viable technique for automatic classification of biomedical literature into topics of interest such as epidemiology, cancer, birth defects etc. In all our experiments, we achieved high values of PPV, sensitivity and specificity.
在科学界最重要的信息资源之一——PubMed数据库中进行特定主题搜索,对用户来说是一项巨大挑战。研究人员通常先制定布尔查询,然后逐一浏览检索到的记录以判断相关性,这既耗时又容易出错。我们将支持向量机(SVM)应用于自动检索与美国疾病控制与预防中心(CDC)的人类基因组流行病学研究相关的PubMed文章。在本文中,我们探讨了对生物医学文献分类的各种研究,并分析了与支持向量机技术中关键词选择、训练集、核函数和参数等各种问题相关的影响。我们报告上述各种因素,以表明支持向量机是一种将生物医学文献自动分类到流行病学、癌症、出生缺陷等感兴趣主题的可行技术。在我们所有的实验中,我们都获得了较高的阳性预测值、灵敏度和特异性。