Suppr超能文献

使用潜在狄利克雷分配对生物医学文献中的基因-药物关系进行排名。

Ranking gene-drug relationships in biomedical literature using Latent Dirichlet Allocation.

作者信息

Wu Yonghui, Liu Mei, Zheng W Jim, Zhao Zhongming, Xu Hua

机构信息

Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37203, USA.

出版信息

Pac Symp Biocomput. 2012:422-33.

Abstract

Drug responses vary greatly among individuals due to human genetic variations, which is known as pharmacogenomics (PGx). Much of the PGx knowledge has been embedded in biomedical literature and there is a growing interest to develop text mining approaches to extract such knowledge. In this paper, we present a study to rank candidate gene-drug relations using Latent Dirichlet Allocation (LDA) model. Our approach consists of three steps: 1) recognize gene and drug entities in MEDLINE abstracts; 2) extract candidate gene-drug pairs based on different levels of co-occurrence, including abstract level, sentence level, and phrase level; and 3) rank candidate gene-drug pairs using multiple different methods including term frequency, Chi-square test, Mutual Information (MI), a reported Kullback-Leibler (KL) distance based on topics derived from LDA (LDA-KL), and a newly defined probabilistic KL distance based on LDA (LDA-PKL). We systematically evaluated these methods by using a gold standard data set of gene-drug relations derived from PharmGKB. Our results showed that the proposed LDA-PKL method achieved better Mean Average Precision (MAP) than any other methods, suggesting its promising uses for ranking and detecting PGx relations.

摘要

由于人类基因变异,个体对药物的反应差异很大,这就是所谓的药物基因组学(PGx)。许多PGx知识已嵌入生物医学文献中,并且人们越来越有兴趣开发文本挖掘方法来提取此类知识。在本文中,我们提出了一项使用潜在狄利克雷分配(LDA)模型对候选基因-药物关系进行排名的研究。我们的方法包括三个步骤:1)在MEDLINE摘要中识别基因和药物实体;2)基于不同的共现水平提取候选基因-药物对,包括摘要水平、句子水平和短语水平;3)使用多种不同方法对候选基因-药物对进行排名,包括词频、卡方检验、互信息(MI)、基于从LDA导出的主题的报告的库尔贝克-莱布勒(KL)距离(LDA-KL)以及基于LDA新定义的概率KL距离(LDA-PKL)。我们使用来自PharmGKB的基因-药物关系金标准数据集系统地评估了这些方法。我们的结果表明,所提出的LDA-PKL方法比任何其他方法都具有更好的平均精度均值(MAP),表明其在排名和检测PGx关系方面具有广阔的应用前景。

相似文献

2
Improving the prediction of pharmacogenes using text-derived drug-gene relationships.
Pac Symp Biocomput. 2010:305-14. doi: 10.1142/9789814295291_0033.
3
A knowledge-driven conditional approach to extract pharmacogenomics specific drug-gene relationships from free text.
J Biomed Inform. 2012 Oct;45(5):827-34. doi: 10.1016/j.jbi.2012.04.011. Epub 2012 Apr 27.
5
An iterative searching and ranking algorithm for prioritising pharmacogenomics genes.
Int J Comput Biol Drug Des. 2013;6(1-2):18-31. doi: 10.1504/IJCBDD.2013.052199. Epub 2013 Feb 21.
6
7
PGxO and PGxLOD: a reconciliation of pharmacogenomic knowledge of various provenances, enabling further comparison.
BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):139. doi: 10.1186/s12859-019-2693-9.
8
Using PharmGKB to train text mining approaches for identifying potential gene targets for pharmacogenomic studies.
J Biomed Inform. 2012 Oct;45(5):862-9. doi: 10.1016/j.jbi.2012.04.007. Epub 2012 May 4.
9
Systematic identification of pharmacogenomics information from clinical trials.
J Biomed Inform. 2012 Oct;45(5):870-8. doi: 10.1016/j.jbi.2012.04.005. Epub 2012 Apr 24.

引用本文的文献

1
Prevalence of exposure to pharmacogenetic drugs by the Saudis treated at the health care centers of the Ministry of National Guard.
Saudi Pharm J. 2022 Aug;30(8):1181-1192. doi: 10.1016/j.jsps.2022.06.013. Epub 2022 Jun 22.
2
A longitudinal study of topic classification on Twitter.
PeerJ Comput Sci. 2022 Jun 7;8:e991. doi: 10.7717/peerj-cs.991. eCollection 2022.
3
Use of Internet of Things for Chronic Disease Management: An Overview.
J Med Signals Sens. 2021 May 24;11(2):138-157. doi: 10.4103/jmss.JMSS_13_20. eCollection 2021 Apr-Jun.
4
Evaluation of clustering and topic modeling methods over health-related tweets and emails.
Artif Intell Med. 2021 Jul;117:102096. doi: 10.1016/j.artmed.2021.102096. Epub 2021 May 7.
5
A semantic relationship mining method among disorders, genes, and drugs from different biomedical datasets.
BMC Med Inform Decis Mak. 2020 Dec 14;20(Suppl 4):283. doi: 10.1186/s12911-020-01274-z.
8
Predicting biomedical relationships using the knowledge and graph embedding cascade model.
PLoS One. 2019 Jun 13;14(6):e0218264. doi: 10.1371/journal.pone.0218264. eCollection 2019.
9
Annotation and detection of drug effects in text for pharmacovigilance.
J Cheminform. 2018 Aug 13;10(1):37. doi: 10.1186/s13321-018-0290-y.
10
Co-occurrence of medical conditions: Exposing patterns through probabilistic topic modeling of snomed codes.
J Biomed Inform. 2018 Jun;82:31-40. doi: 10.1016/j.jbi.2018.04.008. Epub 2018 Apr 12.

本文引用的文献

1
Finding complex biological relationships in recent PubMed articles using Bio-LDA.
PLoS One. 2011 Mar 23;6(3):e17243. doi: 10.1371/journal.pone.0017243.
2
Recent progress in automatically extracting information from the pharmacogenomic literature.
Pharmacogenomics. 2010 Oct;11(10):1467-89. doi: 10.2217/pgs.10.136.
3
Using text to build semantic networks for pharmacogenomics.
J Biomed Inform. 2010 Dec;43(6):1009-19. doi: 10.1016/j.jbi.2010.08.005. Epub 2010 Aug 17.
4
Improving the prediction of pharmacogenes using text-derived drug-gene relationships.
Pac Symp Biocomput. 2010:305-14. doi: 10.1142/9789814295291_0033.
6
Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text.
BMC Bioinformatics. 2009 Feb 5;10 Suppl 2(Suppl 2):S6. doi: 10.1186/1471-2105-10-S2-S6.
7
Textpresso for neuroscience: searching the full text of thousands of neuroscience research papers.
Neuroinformatics. 2008 Sep;6(3):195-204. doi: 10.1007/s12021-008-9031-0. Epub 2008 Oct 24.
9
When good drugs go bad.
Nature. 2007 Apr 26;446(7139):975-7. doi: 10.1038/446975a.
10
RelEx--relation extraction using dependency parse trees.
Bioinformatics. 2007 Feb 1;23(3):365-71. doi: 10.1093/bioinformatics/btl616. Epub 2006 Dec 1.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验