College of Pharmacy, University of Minnesota, Minneapolis, MN 55455, USA.
J Biomed Inform. 2012 Oct;45(5):862-9. doi: 10.1016/j.jbi.2012.04.007. Epub 2012 May 4.
The main objective of this study was to investigate the feasibility of using PharmGKB, a pharmacogenomic database, as a source of training data in combination with text of MEDLINE abstracts for a text mining approach to identification of potential gene targets for pathway-driven pharmacogenomics research. We used the manually curated relations between drugs and genes in PharmGKB database to train a support vector machine predictive model and applied this model prospectively to MEDLINE abstracts. The gene targets suggested by this approach were subsequently manually reviewed. Our quantitative analysis showed that a support vector machine classifiers trained on MEDLINE abstracts with single words (unigrams) used as features and PharmGKB relations used for supervision, achieve an overall sensitivity of 85% and specificity of 69%. The subsequent qualitative analysis showed that gene targets "suggested" by the automatic classifier were not anticipated by expert reviewers but were subsequently found to be relevant to the three drugs that were investigated: carbamazepine, lamivudine and zidovudine. Our results show that this approach is not only feasible but may also find new gene targets not identifiable by other methods thus making it a valuable tool for pathway-driven pharmacogenomics research.
本研究的主要目的是探讨利用 PharmGKB(一个药物基因组学数据库)作为训练数据来源,并结合 MEDLINE 摘要文本,采用文本挖掘方法识别潜在的基因靶点,以进行基于通路的药物基因组学研究。我们使用 PharmGKB 数据库中药物和基因之间的人工整理关系来训练支持向量机预测模型,并将该模型前瞻性地应用于 MEDLINE 摘要。随后,我们对该方法建议的基因靶点进行了人工审查。我们的定量分析表明,在 MEDLINE 摘要中,使用单个单词(unigrams)作为特征,使用 PharmGKB 关系进行监督的支持向量机分类器,其总体灵敏度为 85%,特异性为 69%。随后的定性分析表明,自动分类器“建议”的基因靶点并未被专家评审员预料到,但后来发现与三种药物有关:卡马西平、拉米夫定和齐多夫定。我们的结果表明,这种方法不仅可行,而且还可能发现其他方法无法识别的新基因靶点,因此成为基于通路的药物基因组学研究的一种有价值的工具。