为创建癌症药物毒性知识库：从文献中自动提取癌症药物-副作用关系。

Toward creation of a cancer drug toxicity knowledge base: automatically extracting cancer drug-side effect relationships from the literature.

机构信息

Medical Informatics Program, Center for Clinical Investigation, Case Western Reserve University, Cleveland, Ohio, USA.

出版信息

J Am Med Inform Assoc. 2014 Jan-Feb;21(1):90-6. doi: 10.1136/amiajnl-2012-001584. Epub 2013 May 18.

DOI:10.1136/amiajnl-2012-001584

PMID:23686935

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3912715/

Abstract

OBJECTIVE

A comprehensive and machine-understandable cancer drug-side effect (drug-SE) relationship knowledge base is important for in silico cancer drug target discovery, drug repurposing, and toxicity predication, and for personalized risk-benefit decisions by cancer patients. While US Food and Drug Administration (FDA) drug labels capture well-known cancer drug SE information, much cancer drug SE knowledge remains buried the published biomedical literature. We present a relationship extraction approach to extract cancer drug-SE pairs from the literature.

DATA AND METHODS

We used 21,354,075 MEDLINE records as the text corpus. We extracted drug-SE co-occurrence pairs using a cancer drug lexicon and a clean SE lexicon that we created. We then developed two filtering approaches to remove drug-disease treatment pairs and subsequently a ranking scheme to further prioritize filtered pairs. Finally, we analyzed relationships among SEs, gene targets, and indications.

RESULTS

We extracted 56,602 cancer drug-SE pairs. The filtering algorithms improved the precision of extracted pairs from 0.252 at baseline to 0.426, representing a 69% improvement in precision with no decrease in recall. The ranking algorithm further prioritized filtered pairs and achieved a precision of 0.778 for top-ranked pairs. We showed that cancer drugs that share SEs tend to have overlapping gene targets and overlapping indications.

CONCLUSIONS

The relationship extraction approach is effective in extracting many cancer drug-SE pairs from the literature. This unique knowledge base, when combined with existing cancer drug SE knowledge, can facilitate drug target discovery, drug repurposing, and toxicity prediction.

摘要

目的

全面且可被机器理解的癌症药物副作用（药物-SE）关系知识库对于计算机辅助癌症药物靶点发现、药物再利用以及毒性预测，以及癌症患者的个性化风险-获益决策都非常重要。虽然美国食品和药物管理局（FDA）的药物标签能够很好地捕捉到已知的癌症药物 SE 信息，但许多癌症药物 SE 知识仍隐藏在已发表的生物医学文献中。我们提出了一种关系提取方法，从文献中提取癌症药物-SE 对。

数据和方法

我们使用了 21354075 条 MEDLINE 记录作为文本语料库。我们使用癌症药物词典和我们创建的干净 SE 词典提取药物-SE 共现对。然后，我们开发了两种过滤方法来去除药物-疾病治疗对，随后使用一种排名方案来进一步优先考虑过滤后的对。最后，我们分析了 SE、基因靶点和适应症之间的关系。

结果

我们提取了 56602 对癌症药物-SE。过滤算法将提取对的精度从基线时的 0.252 提高到 0.426，精度提高了 69%，而召回率没有下降。排名算法进一步对过滤后的对进行了优先级排序，对于排名靠前的对，精度达到了 0.778。我们表明，具有共同 SE 的癌症药物往往具有重叠的基因靶点和重叠的适应症。

结论

关系提取方法能够有效地从文献中提取出许多癌症药物-SE 对。这个独特的知识库，与现有的癌症药物 SE 知识相结合，可以促进药物靶点发现、药物再利用以及毒性预测。

相似文献

Toward creation of a cancer drug toxicity knowledge base: automatically extracting cancer drug-side effect relationships from the literature.为创建癌症药物毒性知识库：从文献中自动提取癌症药物-副作用关系。

J Am Med Inform Assoc. 2014 Jan-Feb;21(1):90-6. doi: 10.1136/amiajnl-2012-001584. Epub 2013 May 18.

Combining automatic table classification and relationship extraction in extracting anticancer drug-side effect pairs from full-text articles.在从全文文章中提取抗癌药物-副作用对时结合自动表格分类和关系提取

J Biomed Inform. 2015 Feb;53:128-35. doi: 10.1016/j.jbi.2014.10.002. Epub 2014 Oct 13.

Automatic construction of a large-scale and accurate drug-side-effect association knowledge base from biomedical literature.从生物医学文献中自动构建大规模且准确的药物-副作用关联知识库。

J Biomed Inform. 2014 Oct;51:191-9. doi: 10.1016/j.jbi.2014.05.013. Epub 2014 Jun 10.

Comparing a knowledge-driven approach to a supervised machine learning approach in large-scale extraction of drug-side effect relationships from free-text biomedical literature.在从自由文本生物医学文献中大规模提取药物-副作用关系方面，将知识驱动方法与监督式机器学习方法进行比较。

BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S6. doi: 10.1186/1471-2105-16-S5-S6. Epub 2015 Mar 18.

Towards building a disease-phenotype knowledge base: extracting disease-manifestation relationship from literature.迈向构建疾病-表型知识库：从文献中提取疾病表现关系。

Bioinformatics. 2013 Sep 1;29(17):2186-94. doi: 10.1093/bioinformatics/btt359. Epub 2013 Jul 4.

Large-scale automatic extraction of side effects associated with targeted anticancer drugs from full-text oncological articles.从肿瘤学全文文章中大规模自动提取与靶向抗癌药物相关的副作用

J Biomed Inform. 2015 Jun;55:64-72. doi: 10.1016/j.jbi.2015.03.009. Epub 2015 Mar 27.

Large-scale extraction of accurate drug-disease treatment pairs from biomedical literature for drug repurposing.从生物医学文献中大规模提取准确的药物-疾病治疗对，用于药物重定位。

BMC Bioinformatics. 2013 Jun 6;14:181. doi: 10.1186/1471-2105-14-181.

Automatic signal extraction, prioritizing and filtering approaches in detecting post-marketing cardiovascular events associated with targeted cancer drugs from the FDA Adverse Event Reporting System (FAERS).从美国食品药品监督管理局不良事件报告系统（FAERS）中检测与靶向抗癌药物相关的上市后心血管事件时的自动信号提取、优先级排序和筛选方法。

J Biomed Inform. 2014 Feb;47:171-7. doi: 10.1016/j.jbi.2013.10.008. Epub 2013 Oct 28.

Large-scale combining signals from both biomedical literature and the FDA Adverse Event Reporting System (FAERS) to improve post-marketing drug safety signal detection.从生物医学文献和 FDA 不良事件报告系统（FAERS）中大规模结合信号，以提高上市后药物安全性信号检测。

BMC Bioinformatics. 2014 Jan 15;15:17. doi: 10.1186/1471-2105-15-17.

A knowledge-driven conditional approach to extract pharmacogenomics specific drug-gene relationships from free text.基于知识的条件方法从自由文本中提取药物基因组学特定的药物-基因关系。

J Biomed Inform. 2012 Oct;45(5):827-34. doi: 10.1016/j.jbi.2012.04.011. Epub 2012 Apr 27.

引用本文的文献

approaches for drug repurposing in oncology: a scoping review.肿瘤学中药物重新利用的方法：一项范围综述

Front Pharmacol. 2024 Jun 11;15:1400029. doi: 10.3389/fphar.2024.1400029. eCollection 2024.

Constructing a knowledge-based heterogeneous information graph for medical health status classification.构建用于医疗健康状况分类的基于知识的异构信息图。

Health Inf Sci Syst. 2020 Feb 14;8(1):10. doi: 10.1007/s13755-020-0100-6. eCollection 2020 Dec.

Immunotherapy-related adverse events (irAEs): extraction from FDA drug labels and comparative analysis.免疫疗法相关不良事件（irAEs）：从美国食品药品监督管理局（FDA）药品标签中提取及对比分析

JAMIA Open. 2019 Apr;2(1):173-178. doi: 10.1093/jamiaopen/ooy045. Epub 2018 Oct 15.

tcTKB: an integrated cardiovascular toxicity knowledge base for targeted cancer drugs.tcTKB：一个针对靶向抗癌药物的综合心血管毒性知识库。

AMIA Annu Symp Proc. 2015 Nov 5;2015:1342-51. eCollection 2015.

PubMedMiner: Mining and Visualizing MeSH-based Associations in PubMed.PubMedMiner：挖掘并可视化PubMed中基于医学主题词（MeSH）的关联

AMIA Annu Symp Proc. 2014 Nov 14;2014:1990-9. eCollection 2014.

BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S6. doi: 10.1186/1471-2105-16-S5-S6. Epub 2015 Mar 18.

J Biomed Inform. 2015 Jun;55:64-72. doi: 10.1016/j.jbi.2015.03.009. Epub 2015 Mar 27.

Computational advances in cancer informatics (a).癌症信息学中的计算进展(a)。

Cancer Inform. 2014 Oct 13;13(Suppl 1):45-8. doi: 10.4137/CIN.S19243. eCollection 2014.

Big data: the next frontier for innovation in therapeutics and healthcare.大数据：治疗和医疗保健创新的下一个前沿领域。

Expert Rev Clin Pharmacol. 2014 May;7(3):293-8. doi: 10.1586/17512433.2014.905201. Epub 2014 Apr 7.

BMC Bioinformatics. 2014 Jan 15;15:17. doi: 10.1186/1471-2105-15-17.

本文引用的文献

Design and validation of an automated method to detect known adverse drug reactions in MEDLINE: a contribution from the EU-ADR project.设计并验证一种在 MEDLINE 中自动检测已知药物不良反应的方法：来自 EU-ADR 项目的贡献。

J Am Med Inform Assoc. 2013 May 1;20(3):446-52. doi: 10.1136/amiajnl-2012-001083. Epub 2012 Nov 29.

Large-scale prediction of adverse drug reactions using chemical, biological, and phenotypic properties of drugs.利用药物的化学、生物和表型特性进行大规模药物不良反应预测。

J Am Med Inform Assoc. 2012 Jun;19(e1):e28-35. doi: 10.1136/amiajnl-2011-000699.

J Biomed Inform. 2012 Oct;45(5):827-34. doi: 10.1016/j.jbi.2012.04.011. Epub 2012 Apr 27.

A drug-adverse event extraction algorithm to support pharmacovigilance knowledge mining from PubMed citations.一种用于支持从PubMed文献中挖掘药物警戒知识的药物不良事件提取算法。

AMIA Annu Symp Proc. 2011;2011:1464-70. Epub 2011 Oct 22.

Using information mining of the medical literature to improve drug safety.利用医学文献的信息挖掘来提高药物安全性。

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):668-74. doi: 10.1136/amiajnl-2011-000096. Epub 2011 May 5.

A side effect resource to capture phenotypic effects of drugs.一个用于捕捉药物表型效应的副作用资源。

Mol Syst Biol. 2010;6:343. doi: 10.1038/msb.2009.98. Epub 2010 Jan 19.

Accelerated approval of cancer drugs: improved access to therapeutic breakthroughs or early release of unsafe and ineffective drugs?癌症药物的加速批准：是改善了对治疗突破的获取，还是过早放行不安全且无效的药物？

J Clin Oncol. 2009 Sep 10;27(26):4398-405. doi: 10.1200/JCO.2008.21.1961. Epub 2009 Jul 27.

Data completeness--the Achilles heel of drug-target networks.数据完整性——药物-靶点网络的致命弱点。

Nat Biotechnol. 2008 Sep;26(9):983-4. doi: 10.1038/nbt0908-983.

Drug target identification using side-effect similarity.利用副作用相似性进行药物靶点识别。

Science. 2008 Jul 11;321(5886):263-6. doi: 10.1126/science.1158140.

DrugBank: a knowledgebase for drugs, drug actions and drug targets.药物银行：一个关于药物、药物作用和药物靶点的知识库。

Nucleic Acids Res. 2008 Jan;36(Database issue):D901-6. doi: 10.1093/nar/gkm958. Epub 2007 Nov 29.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验