使用 PharmGKB 来训练文本挖掘方法，以确定药物基因组学研究的潜在基因靶点。

Using PharmGKB to train text mining approaches for identifying potential gene targets for pharmacogenomic studies.

机构信息

College of Pharmacy, University of Minnesota, Minneapolis, MN 55455, USA.

出版信息

J Biomed Inform. 2012 Oct;45(5):862-9. doi: 10.1016/j.jbi.2012.04.007. Epub 2012 May 4.

DOI:10.1016/j.jbi.2012.04.007

PMID:22564551

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3438361/

Abstract

The main objective of this study was to investigate the feasibility of using PharmGKB, a pharmacogenomic database, as a source of training data in combination with text of MEDLINE abstracts for a text mining approach to identification of potential gene targets for pathway-driven pharmacogenomics research. We used the manually curated relations between drugs and genes in PharmGKB database to train a support vector machine predictive model and applied this model prospectively to MEDLINE abstracts. The gene targets suggested by this approach were subsequently manually reviewed. Our quantitative analysis showed that a support vector machine classifiers trained on MEDLINE abstracts with single words (unigrams) used as features and PharmGKB relations used for supervision, achieve an overall sensitivity of 85% and specificity of 69%. The subsequent qualitative analysis showed that gene targets "suggested" by the automatic classifier were not anticipated by expert reviewers but were subsequently found to be relevant to the three drugs that were investigated: carbamazepine, lamivudine and zidovudine. Our results show that this approach is not only feasible but may also find new gene targets not identifiable by other methods thus making it a valuable tool for pathway-driven pharmacogenomics research.

摘要

本研究的主要目的是探讨利用 PharmGKB（一个药物基因组学数据库）作为训练数据来源，并结合 MEDLINE 摘要文本，采用文本挖掘方法识别潜在的基因靶点，以进行基于通路的药物基因组学研究。我们使用 PharmGKB 数据库中药物和基因之间的人工整理关系来训练支持向量机预测模型，并将该模型前瞻性地应用于 MEDLINE 摘要。随后，我们对该方法建议的基因靶点进行了人工审查。我们的定量分析表明，在 MEDLINE 摘要中，使用单个单词（unigrams）作为特征，使用 PharmGKB 关系进行监督的支持向量机分类器，其总体灵敏度为 85%，特异性为 69%。随后的定性分析表明，自动分类器“建议”的基因靶点并未被专家评审员预料到，但后来发现与三种药物有关：卡马西平、拉米夫定和齐多夫定。我们的结果表明，这种方法不仅可行，而且还可能发现其他方法无法识别的新基因靶点，因此成为基于通路的药物基因组学研究的一种有价值的工具。

相似文献

Using PharmGKB to train text mining approaches for identifying potential gene targets for pharmacogenomic studies.使用 PharmGKB 来训练文本挖掘方法，以确定药物基因组学研究的潜在基因靶点。

J Biomed Inform. 2012 Oct;45(5):862-9. doi: 10.1016/j.jbi.2012.04.007. Epub 2012 May 4.

A mutation-centric approach to identifying pharmacogenomic relations in text.基于突变的方法识别文本中的药物基因组学关系。

J Biomed Inform. 2012 Oct;45(5):835-41. doi: 10.1016/j.jbi.2012.05.003. Epub 2012 Jun 7.

Relation mining experiments in the pharmacogenomics domain.药物基因组学领域的关系挖掘实验。

J Biomed Inform. 2012 Oct;45(5):851-61. doi: 10.1016/j.jbi.2012.04.014. Epub 2012 May 10.

PGxMine: Text mining for curation of PharmGKB.PGxMine：用于 PharmGKB 策管的文本挖掘。

Pac Symp Biocomput. 2020;25:611-622.

A knowledge-driven conditional approach to extract pharmacogenomics specific drug-gene relationships from free text.基于知识的条件方法从自由文本中提取药物基因组学特定的药物-基因关系。

J Biomed Inform. 2012 Oct;45(5):827-34. doi: 10.1016/j.jbi.2012.04.011. Epub 2012 Apr 27.

PharmGKB and the International Warfarin Pharmacogenetics Consortium: the changing role for pharmacogenomic databases and single-drug pharmacogenetics.药物基因组学知识库（PharmGKB）与国际华法林药物基因组学联盟：药物基因组学数据库及单药药物基因组学的角色转变

Hum Mutat. 2008 Apr;29(4):456-60. doi: 10.1002/humu.20731.

PharmGKB, an Integrated Resource of Pharmacogenomic Knowledge.PharmGKB，一个综合性的药物基因组学知识库。

Curr Protoc. 2021 Aug;1(8):e226. doi: 10.1002/cpz1.226.

Pharmacogenomics and bioinformatics: PharmGKB.药物基因组学和生物信息学：PharmGKB。

Pharmacogenomics. 2010 Apr;11(4):501-5. doi: 10.2217/pgs.10.15.

Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text.Pharmspresso：一种用于从全文中提取药物基因组学概念和关系的文本挖掘工具。

BMC Bioinformatics. 2009 Feb 5;10 Suppl 2(Suppl 2):S6. doi: 10.1186/1471-2105-10-S2-S6.

Learning the Structure of Biomedical Relationships from Unstructured Text.从非结构化文本中学习生物医学关系的结构

PLoS Comput Biol. 2015 Jul 28;11(7):e1004216. doi: 10.1371/journal.pcbi.1004216. eCollection 2015 Jul.

引用本文的文献

Text Mining Protocol to Retrieve Significant Drug-Gene Interactions from PubMed Abstracts.从 PubMed 摘要中检索重要药物-基因相互作用的文本挖掘方案。

Methods Mol Biol. 2022;2496:17-39. doi: 10.1007/978-1-0716-2305-3_2.

Extracting Concepts for Precision Oncology from the Biomedical Literature.从生物医学文献中提取精准肿瘤学概念。

AMIA Jt Summits Transl Sci Proc. 2021 May 17;2021:276-285. eCollection 2021.

PharmGKB, an Integrated Resource of Pharmacogenomic Knowledge.PharmGKB，一个综合性的药物基因组学知识库。

Curr Protoc. 2021 Aug;1(8):e226. doi: 10.1002/cpz1.226.

PGxCorpus, a manually annotated corpus for pharmacogenomics.PGxCorpus，一个用于药物基因组学的人工标注语料库。

Sci Data. 2020 Jan 2;7(1):3. doi: 10.1038/s41597-019-0342-9.

eGARD: Extracting associations between genomic anomalies and drug responses from text.eGARD：从文本中提取基因组异常与药物反应之间的关联。

PLoS One. 2017 Dec 20;12(12):e0189663. doi: 10.1371/journal.pone.0189663. eCollection 2017.

Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research.从文本和大规模数据分析中提取基因与疾病之间的关系：对转化研究的启示。

BMC Bioinformatics. 2015 Feb 21;16:55. doi: 10.1186/s12859-015-0472-9.

A Framework of Knowledge Integration and Discovery for Supporting Pharmacogenomics Target Predication of Adverse Drug Events: A Case Study of Drug-Induced Long QT Syndrome.支持药物不良事件的药物基因组学靶点预测的知识整合与发现框架：以药物性长QT综合征为例

AMIA Jt Summits Transl Sci Proc. 2013 Mar 18;2013:88-92. eCollection 2013.

Discovery of novel biomarkers and phenotypes by semantic technologies.通过语义技术发现新的生物标志物和表型。

BMC Bioinformatics. 2013 Feb 13;14:51. doi: 10.1186/1471-2105-14-51.

Drug target inference through pathway analysis of genomics data.通过基因组学数据的通路分析进行药物靶点推断。

Adv Drug Deliv Rev. 2013 Jun 30;65(7):966-72. doi: 10.1016/j.addr.2012.12.004. Epub 2013 Jan 28.

本文引用的文献

Recent progress in automatically extracting information from the pharmacogenomic literature.从药物基因组学文献中自动提取信息的最新进展。

Pharmacogenomics. 2010 Oct;11(10):1467-89. doi: 10.2217/pgs.10.136.

Using text to build semantic networks for pharmacogenomics.利用文本构建药物基因组学的语义网络。

J Biomed Inform. 2010 Dec;43(6):1009-19. doi: 10.1016/j.jbi.2010.08.005. Epub 2010 Aug 17.

Improving the prediction of pharmacogenes using text-derived drug-gene relationships.利用文本衍生的药物-基因关系改进药物基因的预测。

Pac Symp Biocomput. 2010:305-14. doi: 10.1142/9789814295291_0033.

Generating genome-scale candidate gene lists for pharmacogenomics.生成用于药物基因组学的全基因组规模候选基因列表。

Clin Pharmacol Ther. 2009 Aug;86(2):183-9. doi: 10.1038/clpt.2009.42. Epub 2009 Apr 15.

BMC Bioinformatics. 2009 Feb 5;10 Suppl 2(Suppl 2):S6. doi: 10.1186/1471-2105-10-S2-S6.

Hospital admissions associated with adverse drug reactions: a systematic review of prospective observational studies.与药物不良反应相关的住院情况：前瞻性观察性研究的系统评价

Ann Pharmacother. 2008 Jul;42(7):1017-25. doi: 10.1345/aph.1L037. Epub 2008 Jul 1.

Gene symbol disambiguation using knowledge-based profiles.使用基于知识的概况进行基因符号消歧。

Bioinformatics. 2007 Apr 15;23(8):1015-22. doi: 10.1093/bioinformatics/btm056. Epub 2007 Feb 21.

A comparative study of supervised learning as applied to acronym expansion in clinical reports.一项关于监督学习应用于临床报告中首字母缩略词扩展的对比研究。

AMIA Annu Symp Proc. 2006;2006:399-403.

Abbreviation and acronym disambiguation in clinical discourse.临床语篇中的缩写词和首字母缩略词消歧

AMIA Annu Symp Proc. 2005;2005:589-93.

Inheritance and drug response.遗传与药物反应。

N Engl J Med. 2003 Feb 6;348(6):529-37. doi: 10.1056/NEJMra020021.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验