在从自由文本生物医学文献中大规模提取药物-副作用关系方面，将知识驱动方法与监督式机器学习方法进行比较。

Comparing a knowledge-driven approach to a supervised machine learning approach in large-scale extraction of drug-side effect relationships from free-text biomedical literature.

作者信息

Xu Rong, Wang QuanQiu

出版信息

BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S6. doi: 10.1186/1471-2105-16-S5-S6. Epub 2015 Mar 18.

DOI:10.1186/1471-2105-16-S5-S6

PMID:25860223

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4402591/

Abstract

BACKGROUND

Systems approaches to studying drug-side-effect (drug-SE) associations are emerging as an active research area for both drug target discovery and drug repositioning. However, a comprehensive drug-SE association knowledge base does not exist. In this study, we present a novel knowledge-driven (KD) approach to effectively extract a large number of drug-SE pairs from published biomedical literature.

DATA AND METHODS

For the text corpus, we used 21,354,075 MEDLINE records (119,085,682 sentences). First, we used known drug-SE associations derived from FDA drug labels as prior knowledge to automatically find SE-related sentences and abstracts. We then extracted a total of 49,575 drug-SE pairs from MEDLINE sentences and 180,454 pairs from abstracts.

RESULTS

On average, the KD approach has achieved a precision of 0.335, a recall of 0.509, and an F1 of 0.392, which is significantly better than a SVM-based machine learning approach (precision: 0.135, recall: 0.900, F1: 0.233) with a 73.0% increase in F1 score. Through integrative analysis, we demonstrate that the higher-level phenotypic drug-SE relationships reflects lower-level genetic, genomic, and chemical drug mechanisms. In addition, we show that the extracted drug-SE pairs can be directly used in drug repositioning.

CONCLUSION

In summary, we automatically constructed a large-scale higher-level drug phenotype relationship knowledge, which can have great potential in computational drug discovery.

摘要

背景

用于研究药物副作用（药物-SE）关联的系统方法正在成为药物靶点发现和药物重新定位的一个活跃研究领域。然而，一个全面的药物-SE关联知识库并不存在。在本研究中，我们提出了一种新颖的知识驱动（KD）方法，以有效地从已发表的生物医学文献中提取大量药物-SE对。

数据和方法

对于文本语料库，我们使用了21354075条MEDLINE记录（119085682个句子）。首先，我们将从FDA药物标签中获得的已知药物-SE关联作为先验知识，自动找到与SE相关的句子和摘要。然后，我们从MEDLINE句子中总共提取了49575对药物-SE对，从摘要中提取了180454对。

结果

平均而言，KD方法的精确率为0.335，召回率为0.509，F1值为0.392，这明显优于基于支持向量机的机器学习方法（精确率：0.135，召回率：0.900，F1值：0.233），F1分数提高了73.0%。通过综合分析，我们证明了更高层次的表型药物-SE关系反映了更低层次的遗传、基因组和化学药物机制。此外，我们表明提取的药物-SE对可直接用于药物重新定位。

结论

总之，我们自动构建了一个大规模的更高层次的药物表型关系知识，这在计算药物发现中可能具有巨大潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2bf6/4402591/7f042fc4e64a/1471-2105-16-S5-S6-1.jpg

相似文献

Comparing a knowledge-driven approach to a supervised machine learning approach in large-scale extraction of drug-side effect relationships from free-text biomedical literature.在从自由文本生物医学文献中大规模提取药物-副作用关系方面，将知识驱动方法与监督式机器学习方法进行比较。

BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S6. doi: 10.1186/1471-2105-16-S5-S6. Epub 2015 Mar 18.

Automatic construction of a large-scale and accurate drug-side-effect association knowledge base from biomedical literature.从生物医学文献中自动构建大规模且准确的药物-副作用关联知识库。

J Biomed Inform. 2014 Oct;51:191-9. doi: 10.1016/j.jbi.2014.05.013. Epub 2014 Jun 10.

Combining automatic table classification and relationship extraction in extracting anticancer drug-side effect pairs from full-text articles.在从全文文章中提取抗癌药物-副作用对时结合自动表格分类和关系提取

J Biomed Inform. 2015 Feb;53:128-35. doi: 10.1016/j.jbi.2014.10.002. Epub 2014 Oct 13.

Towards building a disease-phenotype knowledge base: extracting disease-manifestation relationship from literature.迈向构建疾病-表型知识库：从文献中提取疾病表现关系。

Bioinformatics. 2013 Sep 1;29(17):2186-94. doi: 10.1093/bioinformatics/btt359. Epub 2013 Jul 4.

Toward creation of a cancer drug toxicity knowledge base: automatically extracting cancer drug-side effect relationships from the literature.为创建癌症药物毒性知识库：从文献中自动提取癌症药物-副作用关系。

J Am Med Inform Assoc. 2014 Jan-Feb;21(1):90-6. doi: 10.1136/amiajnl-2012-001584. Epub 2013 May 18.

Large-scale automatic extraction of side effects associated with targeted anticancer drugs from full-text oncological articles.从肿瘤学全文文章中大规模自动提取与靶向抗癌药物相关的副作用

J Biomed Inform. 2015 Jun;55:64-72. doi: 10.1016/j.jbi.2015.03.009. Epub 2015 Mar 27.

Large-scale extraction of accurate drug-disease treatment pairs from biomedical literature for drug repurposing.从生物医学文献中大规模提取准确的药物-疾病治疗对，用于药物重定位。

BMC Bioinformatics. 2013 Jun 6;14:181. doi: 10.1186/1471-2105-14-181.

dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text.dRiskKB：一个从生物医学文本中构建的大规模疾病-疾病风险关系知识库。

BMC Bioinformatics. 2014 Apr 12;15:105. doi: 10.1186/1471-2105-15-105.

A knowledge-driven conditional approach to extract pharmacogenomics specific drug-gene relationships from free text.基于知识的条件方法从自由文本中提取药物基因组学特定的药物-基因关系。

J Biomed Inform. 2012 Oct;45(5):827-34. doi: 10.1016/j.jbi.2012.04.011. Epub 2012 Apr 27.

Knowledge-based extraction of adverse drug events from biomedical text.基于知识的生物医学文本中不良药物事件的提取。

BMC Bioinformatics. 2014 Mar 4;15:64. doi: 10.1186/1471-2105-15-64.

引用本文的文献

Immunotherapy-related adverse events (irAEs): extraction from FDA drug labels and comparative analysis.免疫疗法相关不良事件（irAEs）：从美国食品药品监督管理局（FDA）药品标签中提取及对比分析

JAMIA Open. 2019 Apr;2(1):173-178. doi: 10.1093/jamiaopen/ooy045. Epub 2018 Oct 15.

Overview of the First Natural Language Processing Challenge for Extracting Medication, Indication, and Adverse Drug Events from Electronic Health Record Notes (MADE 1.0).从电子健康记录中提取药物、适应症和药物不良事件的自然语言处理挑战赛概述（MADE 1.0）。

Drug Saf. 2019 Jan;42(1):99-111. doi: 10.1007/s40264-018-0762-z.

Clinical Relation Extraction Toward Drug Safety Surveillance Using Electronic Health Record Narratives: Classical Learning Versus Deep Learning.利用电子健康记录叙述进行药物安全监测的临床关系提取：经典学习与深度学习

JMIR Public Health Surveill. 2018 Apr 25;4(2):e29. doi: 10.2196/publichealth.9361.

A computational method for the identification of candidate drugs for non-small cell lung cancer.一种用于鉴定非小细胞肺癌候选药物的计算方法。

PLoS One. 2017 Aug 18;12(8):e0183411. doi: 10.1371/journal.pone.0183411. eCollection 2017.

Big data and data repurposing - using existing data to answer new questions in vascular dementia research.大数据与数据再利用——利用现有数据解答血管性痴呆研究中的新问题。

BMC Neurol. 2017 Apr 17;17(1):72. doi: 10.1186/s12883-017-0841-2.

Weakly supervised learning of biomedical information extraction from curated data.从整理数据中进行生物医学信息提取的弱监督学习。

BMC Bioinformatics. 2016 Jan 11;17 Suppl 1(Suppl 1):1. doi: 10.1186/s12859-015-0844-1.

本文引用的文献

J Biomed Inform. 2015 Feb;53:128-35. doi: 10.1016/j.jbi.2014.10.002. Epub 2014 Oct 13.

J Biomed Inform. 2014 Oct;51:191-9. doi: 10.1016/j.jbi.2014.05.013. Epub 2014 Jun 10.

Large-scale combining signals from both biomedical literature and the FDA Adverse Event Reporting System (FAERS) to improve post-marketing drug safety signal detection.从生物医学文献和 FDA 不良事件报告系统（FAERS）中大规模结合信号，以提高上市后药物安全性信号检测。

BMC Bioinformatics. 2014 Jan 15;15:17. doi: 10.1186/1471-2105-15-17.

Automatic signal extraction, prioritizing and filtering approaches in detecting post-marketing cardiovascular events associated with targeted cancer drugs from the FDA Adverse Event Reporting System (FAERS).从美国食品药品监督管理局不良事件报告系统（FAERS）中检测与靶向抗癌药物相关的上市后心血管事件时的自动信号提取、优先级排序和筛选方法。

J Biomed Inform. 2014 Feb;47:171-7. doi: 10.1016/j.jbi.2013.10.008. Epub 2013 Oct 28.

Towards building a disease-phenotype knowledge base: extracting disease-manifestation relationship from literature.迈向构建疾病-表型知识库：从文献中提取疾病表现关系。

Bioinformatics. 2013 Sep 1;29(17):2186-94. doi: 10.1093/bioinformatics/btt359. Epub 2013 Jul 4.

BMC Bioinformatics. 2013 Jun 6;14:181. doi: 10.1186/1471-2105-14-181.

J Am Med Inform Assoc. 2014 Jan-Feb;21(1):90-6. doi: 10.1136/amiajnl-2012-001584. Epub 2013 May 18.

Chapter 16: text mining for translational bioinformatics.第十六章：转化生物信息学中的文本挖掘。

PLoS Comput Biol. 2013 Apr;9(4):e1003044. doi: 10.1371/journal.pcbi.1003044. Epub 2013 Apr 25.

A semi-supervised approach to extract pharmacogenomics-specific drug-gene pairs from biomedical literature for personalized medicine.一种从生物医学文献中提取用于个性化医疗的药物-基因对的半监督方法。

J Biomed Inform. 2013 Aug;46(4):585-93. doi: 10.1016/j.jbi.2013.04.001. Epub 2013 Apr 6.

Computational drug repositioning: from data to therapeutics.计算药物重定位：从数据到治疗。

Clin Pharmacol Ther. 2013 Apr;93(4):335-41. doi: 10.1038/clpt.2013.1. Epub 2013 Jan 15.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在从自由文本生物医学文献中大规模提取药物-副作用关系方面，将知识驱动方法与监督式机器学习方法进行比较。

Comparing a knowledge-driven approach to a supervised machine learning approach in large-scale extraction of drug-side effect relationships from free-text biomedical literature.

作者信息

出版信息

BACKGROUND

DATA AND METHODS

RESULTS

CONCLUSION

背景

数据和方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献