一种从生物医学文献中提取用于个性化医疗的药物-基因对的半监督方法。

A semi-supervised approach to extract pharmacogenomics-specific drug-gene pairs from biomedical literature for personalized medicine.

机构信息

Medical Informatics Division, Case Western Reserve University, OH, USA.

出版信息

J Biomed Inform. 2013 Aug;46(4):585-93. doi: 10.1016/j.jbi.2013.04.001. Epub 2013 Apr 6.

DOI:10.1016/j.jbi.2013.04.001

PMID:23570835

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4452014/

Abstract

Personalized medicine is to deliver the right drug to the right patient in the right dose. Pharmacogenomics (PGx) is to identify genetic variants that may affect drug efficacy and toxicity. The availability of a comprehensive and accurate PGx-specific drug-gene relationship knowledge base is important for personalized medicine. However, building a large-scale PGx-specific drug-gene knowledge base is a difficult task. In this study, we developed a bootstrapping, semi-supervised learning approach to iteratively extract and rank drug-gene pairs according to their relevance to drug pharmacogenomics. Starting with a single PGx-specific seed pair and 20 million MEDLINE abstracts, the extraction algorithm achieved a precision of 0.219, recall of 0.368 and F1 of 0.274 after two iterations, a significant improvement over the results of using non-PGx-specific seeds (precision: 0.011, recall: 0.018, and F1: 0.014) or co-occurrence (precision: 0.015, recall: 1.000, and F1: 0.030). After the extraction step, the ranking algorithm further improved the precision from 0.219 to 0.561 for top ranked pairs. By comparing to a dictionary-based approach with PGx-specific gene lexicon as input, we showed that the bootstrapping approach has better performance in terms of both precision and F1 (precision: 0.251 vs. 0.152, recall: 0.396 vs. 0.856 and F1: 0.292 vs. 0.254). By integrative analysis using a large drug adverse event database, we have shown that the extracted drug-gene pairs strongly correlate with drug adverse events. In conclusion, we developed a novel semi-supervised bootstrapping approach for effective PGx-specific drug-gene pair extraction from large number of MEDLINE articles with minimal human input.

摘要

个体化医学旨在为合适的患者提供合适剂量的正确药物。药物基因组学（PGx）旨在识别可能影响药物疗效和毒性的遗传变异。拥有全面准确的 PGx 特异性药物-基因关系知识库对于个体化医学非常重要。然而，构建大规模的 PGx 特异性药物-基因知识库是一项艰巨的任务。在这项研究中，我们开发了一种自举、半监督学习方法，根据药物基因组学的相关性迭代提取和排序药物-基因对。从单个 PGx 特异性种子对和 2000 万篇 MEDLINE 摘要开始，提取算法在经过两轮迭代后，在精度为 0.219、召回率为 0.368 和 F1 为 0.274，与使用非 PGx 特异性种子（精度：0.011、召回率：0.018 和 F1：0.014）或共现（精度：0.015、召回率：1.000 和 F1：0.030）的结果相比有显著提高。在提取步骤之后，排序算法进一步将前几名的精度从 0.219 提高到 0.561。通过与基于字典的方法进行比较，该方法使用 PGx 特异性基因词典作为输入，我们表明自举方法在精度和 F1 方面都具有更好的性能（精度：0.251 与 0.152、召回率：0.396 与 0.856 和 F1：0.292 与 0.254）。通过使用大型药物不良反应数据库进行综合分析，我们已经表明，提取的药物-基因对与药物不良反应密切相关。总之，我们开发了一种新颖的半监督自举方法，用于从大量 MEDLINE 文章中提取最小人工输入的有效 PGx 特异性药物-基因对。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/84bf/4452014/c1c254440d38/nihms604809f1.jpg

相似文献

A semi-supervised approach to extract pharmacogenomics-specific drug-gene pairs from biomedical literature for personalized medicine.一种从生物医学文献中提取用于个性化医疗的药物-基因对的半监督方法。

J Biomed Inform. 2013 Aug;46(4):585-93. doi: 10.1016/j.jbi.2013.04.001. Epub 2013 Apr 6.

An iterative searching and ranking algorithm for prioritising pharmacogenomics genes.一种用于对药物基因组学基因进行优先级排序的迭代搜索和排名算法。

Int J Comput Biol Drug Des. 2013;6(1-2):18-31. doi: 10.1504/IJCBDD.2013.052199. Epub 2013 Feb 21.

A knowledge-driven conditional approach to extract pharmacogenomics specific drug-gene relationships from free text.基于知识的条件方法从自由文本中提取药物基因组学特定的药物-基因关系。

J Biomed Inform. 2012 Oct;45(5):827-34. doi: 10.1016/j.jbi.2012.04.011. Epub 2012 Apr 27.

Ranking gene-drug relationships in biomedical literature using Latent Dirichlet Allocation.使用潜在狄利克雷分配对生物医学文献中的基因-药物关系进行排名。

Pac Symp Biocomput. 2012:422-33.

Toward creation of a cancer drug toxicity knowledge base: automatically extracting cancer drug-side effect relationships from the literature.为创建癌症药物毒性知识库：从文献中自动提取癌症药物-副作用关系。

J Am Med Inform Assoc. 2014 Jan-Feb;21(1):90-6. doi: 10.1136/amiajnl-2012-001584. Epub 2013 May 18.

Personalized Medicine: Pharmacogenomics and Drug Development.个性化医疗：药物基因组学与药物开发

Acta Med Iran. 2017 Mar;55(3):150-165.

Comparing a knowledge-driven approach to a supervised machine learning approach in large-scale extraction of drug-side effect relationships from free-text biomedical literature.在从自由文本生物医学文献中大规模提取药物-副作用关系方面，将知识驱动方法与监督式机器学习方法进行比较。

BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S6. doi: 10.1186/1471-2105-16-S5-S6. Epub 2015 Mar 18.

Building an information system to facilitate pharmacogenomics clinical translation with clinical decision support.建立一个信息系统，通过临床决策支持来促进临床转化。

Pharmacogenomics. 2022 Jan;23(1):35-48. doi: 10.2217/pgs-2021-0110. Epub 2021 Nov 17.

Pharmacogenomics - a minor rather than major force in clinical medicine.药物基因组学——临床医学中的次要而非主要力量。

Expert Rev Clin Pharmacol. 2024 Mar;17(3):203-212. doi: 10.1080/17512433.2024.2314726. Epub 2024 Feb 6.

Will Precision Medicine Meet Digital Health? A Systematic Review of Pharmacogenomics Clinical Decision Support Systems Used in Clinical Practice.精准医学能否与数字健康相结合？临床实践中应用的药物基因组学临床决策支持系统的系统评价。

OMICS. 2024 Sep;28(9):442-460. doi: 10.1089/omi.2024.0131. Epub 2024 Aug 13.

引用本文的文献

Global Text Mining and Development of Pharmacogenomic Knowledge Resource for Precision Medicine.用于精准医学的全球文本挖掘与药物基因组学知识资源开发。

Front Pharmacol. 2019 Aug 7;10:839. doi: 10.3389/fphar.2019.00839. eCollection 2019.

Computational Advances in Drug Safety: Systematic and Mapping Review of Knowledge Engineering Based Approaches.药物安全性的计算进展：基于知识工程方法的系统综述与图谱综述

Front Pharmacol. 2019 May 17;10:415. doi: 10.3389/fphar.2019.00415. eCollection 2019.

Clinical Document Classification Using Labeled and Unlabeled Data Across Hospitals.利用各医院的标记和未标记数据进行临床文档分类

AMIA Annu Symp Proc. 2018 Dec 5;2018:545-554. eCollection 2018.

Translating cancer genomics into precision medicine with artificial intelligence: applications, challenges and future perspectives.将癌症基因组学转化为人工智能导向的精准医学：应用、挑战和未来展望。

Hum Genet. 2019 Feb;138(2):109-124. doi: 10.1007/s00439-019-01970-5. Epub 2019 Jan 22.

Computational dynamic approaches for temporal omics data with applications to systems medicine.用于时间组学数据的计算动力学方法及其在系统医学中的应用

BioData Min. 2017 Jun 17;10:20. doi: 10.1186/s13040-017-0140-x. eCollection 2017.

Identifying Liver Cancer and Its Relations with Diseases, Drugs, and Genes: A Literature-Based Approach.基于文献的方法识别肝癌及其与疾病、药物和基因的关系。

PLoS One. 2016 May 19;11(5):e0156091. doi: 10.1371/journal.pone.0156091. eCollection 2016.

BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S6. doi: 10.1186/1471-2105-16-S5-S6. Epub 2015 Mar 18.

Automatic construction of a large-scale and accurate drug-side-effect association knowledge base from biomedical literature.从生物医学文献中自动构建大规模且准确的药物-副作用关联知识库。

J Biomed Inform. 2014 Oct;51:191-9. doi: 10.1016/j.jbi.2014.05.013. Epub 2014 Jun 10.

Chemical named entities recognition: a review on approaches and applications.化学命名实体识别：方法与应用综述

J Cheminform. 2014 Apr 28;6:17. doi: 10.1186/1758-2946-6-17. eCollection 2014.

Towards building a disease-phenotype knowledge base: extracting disease-manifestation relationship from literature.迈向构建疾病-表型知识库：从文献中提取疾病表现关系。

Bioinformatics. 2013 Sep 1;29(17):2186-94. doi: 10.1093/bioinformatics/btt359. Epub 2013 Jul 4.

本文引用的文献

An iterative searching and ranking algorithm for prioritising pharmacogenomics genes.一种用于对药物基因组学基因进行优先级排序的迭代搜索和排名算法。

Int J Comput Biol Drug Des. 2013;6(1-2):18-31. doi: 10.1504/IJCBDD.2013.052199. Epub 2013 Feb 21.

J Biomed Inform. 2012 Oct;45(5):827-34. doi: 10.1016/j.jbi.2012.04.011. Epub 2012 Apr 27.

Ranking gene-drug relationships in biomedical literature using Latent Dirichlet Allocation.使用潜在狄利克雷分配对生物医学文献中的基因-药物关系进行排名。

Pac Symp Biocomput. 2012:422-33.

Personalizing medicine with clinical pharmacogenetics.临床药物遗传学个性化医疗。

Genet Med. 2011 Dec;13(12):987-95. doi: 10.1097/GIM.0b013e318238b38c.

Extraction of Conditional Probabilities of the Relationships Between Drugs, Diseases, and Genes from PubMed Guided by Relationships in PharmGKB.在PharmGKB中的关系指导下，从PubMed中提取药物、疾病和基因之间关系的条件概率。

Summit Transl Bioinform. 2009 Mar 1;2009:124-8.

Pharmacogenomics at the tipping point: challenges and opportunities.药物基因组学处于转折点：挑战与机遇

Clin Pharmacol Ther. 2011 Mar;89(3):323-7. doi: 10.1038/clpt.2010.340.

Recent progress in automatically extracting information from the pharmacogenomic literature.从药物基因组学文献中自动提取信息的最新进展。

Pharmacogenomics. 2010 Oct;11(10):1467-89. doi: 10.2217/pgs.10.136.

Using text to build semantic networks for pharmacogenomics.利用文本构建药物基因组学的语义网络。

J Biomed Inform. 2010 Dec;43(6):1009-19. doi: 10.1016/j.jbi.2010.08.005. Epub 2010 Aug 17.

Unsupervised method for extracting machine understandable medical knowledge from a large free text collection.从大型自由文本集合中提取机器可理解的医学知识的无监督方法。

AMIA Annu Symp Proc. 2009 Nov 14;2009:709-13.

DNA, drugs and chariots: on a decade of pharmacogenomics at the US FDA.DNA、药物和战车：美国 FDA 十年的药物基因组学

Pharmacogenomics. 2010 Apr;11(4):507-12. doi: 10.2217/pgs.10.16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种从生物医学文献中提取用于个性化医疗的药物-基因对的半监督方法。

A semi-supervised approach to extract pharmacogenomics-specific drug-gene pairs from biomedical literature for personalized medicine.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献