Suppr超能文献

一种从生物医学文献中提取用于个性化医疗的药物-基因对的半监督方法。

A semi-supervised approach to extract pharmacogenomics-specific drug-gene pairs from biomedical literature for personalized medicine.

机构信息

Medical Informatics Division, Case Western Reserve University, OH, USA.

出版信息

J Biomed Inform. 2013 Aug;46(4):585-93. doi: 10.1016/j.jbi.2013.04.001. Epub 2013 Apr 6.

Abstract

Personalized medicine is to deliver the right drug to the right patient in the right dose. Pharmacogenomics (PGx) is to identify genetic variants that may affect drug efficacy and toxicity. The availability of a comprehensive and accurate PGx-specific drug-gene relationship knowledge base is important for personalized medicine. However, building a large-scale PGx-specific drug-gene knowledge base is a difficult task. In this study, we developed a bootstrapping, semi-supervised learning approach to iteratively extract and rank drug-gene pairs according to their relevance to drug pharmacogenomics. Starting with a single PGx-specific seed pair and 20 million MEDLINE abstracts, the extraction algorithm achieved a precision of 0.219, recall of 0.368 and F1 of 0.274 after two iterations, a significant improvement over the results of using non-PGx-specific seeds (precision: 0.011, recall: 0.018, and F1: 0.014) or co-occurrence (precision: 0.015, recall: 1.000, and F1: 0.030). After the extraction step, the ranking algorithm further improved the precision from 0.219 to 0.561 for top ranked pairs. By comparing to a dictionary-based approach with PGx-specific gene lexicon as input, we showed that the bootstrapping approach has better performance in terms of both precision and F1 (precision: 0.251 vs. 0.152, recall: 0.396 vs. 0.856 and F1: 0.292 vs. 0.254). By integrative analysis using a large drug adverse event database, we have shown that the extracted drug-gene pairs strongly correlate with drug adverse events. In conclusion, we developed a novel semi-supervised bootstrapping approach for effective PGx-specific drug-gene pair extraction from large number of MEDLINE articles with minimal human input.

摘要

个体化医学旨在为合适的患者提供合适剂量的正确药物。药物基因组学(PGx)旨在识别可能影响药物疗效和毒性的遗传变异。拥有全面准确的 PGx 特异性药物-基因关系知识库对于个体化医学非常重要。然而,构建大规模的 PGx 特异性药物-基因知识库是一项艰巨的任务。在这项研究中,我们开发了一种自举、半监督学习方法,根据药物基因组学的相关性迭代提取和排序药物-基因对。从单个 PGx 特异性种子对和 2000 万篇 MEDLINE 摘要开始,提取算法在经过两轮迭代后,在精度为 0.219、召回率为 0.368 和 F1 为 0.274,与使用非 PGx 特异性种子(精度:0.011、召回率:0.018 和 F1:0.014)或共现(精度:0.015、召回率:1.000 和 F1:0.030)的结果相比有显著提高。在提取步骤之后,排序算法进一步将前几名的精度从 0.219 提高到 0.561。通过与基于字典的方法进行比较,该方法使用 PGx 特异性基因词典作为输入,我们表明自举方法在精度和 F1 方面都具有更好的性能(精度:0.251 与 0.152、召回率:0.396 与 0.856 和 F1:0.292 与 0.254)。通过使用大型药物不良反应数据库进行综合分析,我们已经表明,提取的药物-基因对与药物不良反应密切相关。总之,我们开发了一种新颖的半监督自举方法,用于从大量 MEDLINE 文章中提取最小人工输入的有效 PGx 特异性药物-基因对。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/84bf/4452014/c1c254440d38/nihms604809f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验