从非结构化文本中学习生物医学关系的结构

Learning the Structure of Biomedical Relationships from Unstructured Text.

作者信息

Percha Bethany, Altman Russ B

机构信息

Biomedical Informatics Training Program, Stanford University, Stanford, California, United States of America.

Departments of Medicine, Genetics and Bioengineering, Stanford University, Stanford, California, United States of America.

出版信息

PLoS Comput Biol. 2015 Jul 28;11(7):e1004216. doi: 10.1371/journal.pcbi.1004216. eCollection 2015 Jul.

DOI:10.1371/journal.pcbi.1004216

PMID:26219079

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4517797/

Abstract

The published biomedical research literature encompasses most of our understanding of how drugs interact with gene products to produce physiological responses (phenotypes). Unfortunately, this information is distributed throughout the unstructured text of over 23 million articles. The creation of structured resources that catalog the relationships between drugs and genes would accelerate the translation of basic molecular knowledge into discoveries of genomic biomarkers for drug response and prediction of unexpected drug-drug interactions. Extracting these relationships from natural language sentences on such a large scale, however, requires text mining algorithms that can recognize when different-looking statements are expressing similar ideas. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of biomedical relationships automatically from text, overcoming differences in word choice and sentence structure. We validate EBC's performance against manually-curated sets of (1) pharmacogenomic relationships from PharmGKB and (2) drug-target relationships from DrugBank, and use it to discover new drug-gene relationships for both knowledge bases. We then apply EBC to map the complete universe of drug-gene relationships based on their descriptions in Medline, revealing unexpected structure that challenges current notions about how these relationships are expressed in text. For instance, we learn that newer experimental findings are described in consistently different ways than established knowledge, and that seemingly pure classes of relationships can exhibit interesting chimeric structure. The EBC algorithm is flexible and adaptable to a wide range of problems in biomedical text mining.

摘要

已发表的生物医学研究文献涵盖了我们对药物如何与基因产物相互作用以产生生理反应（表型）的大部分理解。不幸的是，这些信息分布在超过2300万篇文章的非结构化文本中。创建编目药物与基因之间关系的结构化资源将加速将基础分子知识转化为药物反应基因组生物标志物的发现以及预测意外的药物 - 药物相互作用。然而，要从如此大规模的自然语言句子中提取这些关系，需要能够识别不同表述何时表达相似想法的文本挖掘算法。在这里，我们描述了一种新颖的算法，即分类集成双聚类算法（EBC），它能从文本中自动学习生物医学关系的结构，克服了词汇选择和句子结构的差异。我们对照（1）来自PharmGKB的药物基因组学关系和（2）来自DrugBank的药物 - 靶点关系的人工整理集验证了EBC的性能，并使用它为两个知识库发现新的药物 - 基因关系。然后，我们应用EBC根据Medline中的描述绘制药物 - 基因关系的完整图谱，揭示了意想不到的结构，这些结构挑战了当前关于这些关系在文本中如何表达的观念。例如，我们了解到新的实验发现与既定知识的描述方式始终不同，而且看似纯粹的关系类别可能呈现出有趣的嵌合结构。EBC算法灵活且适用于生物医学文本挖掘中的广泛问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c137/4517797/ce1dbdedd86c/pcbi.1004216.g001.jpg

相似文献

Learning the Structure of Biomedical Relationships from Unstructured Text.从非结构化文本中学习生物医学关系的结构

PLoS Comput Biol. 2015 Jul 28;11(7):e1004216. doi: 10.1371/journal.pcbi.1004216. eCollection 2015 Jul.

A global network of biomedical relationships derived from text.从文本中提取的生物医学关系的全球网络。

Bioinformatics. 2018 Aug 1;34(15):2614-2624. doi: 10.1093/bioinformatics/bty114.

Extracting Dependence Relations from Unstructured Medical Text.从非结构化医学文本中提取依赖关系

Stud Health Technol Inform. 2015;216:1032.

Knowledge based word-concept model estimation and refinement for biomedical text mining.用于生物医学文本挖掘的基于知识的词概念模型估计与优化。

J Biomed Inform. 2015 Feb;53:300-7. doi: 10.1016/j.jbi.2014.11.015. Epub 2014 Dec 12.

Inferring the semantic relationships of words within an ontology using random indexing: applications to pharmacogenomics.使用随机索引推断本体中词汇的语义关系：在药物基因组学中的应用

AMIA Annu Symp Proc. 2013 Nov 16;2013:1123-32. eCollection 2013.

Improving the prediction of pharmacogenes using text-derived drug-gene relationships.利用文本衍生的药物-基因关系改进药物基因的预测。

Pac Symp Biocomput. 2010:305-14. doi: 10.1142/9789814295291_0033.

CIBS: A biomedical text summarizer using topic-based sentence clustering.CIBS：一种基于主题的句子聚类的生物医学文本摘要器。

J Biomed Inform. 2018 Dec;88:53-61. doi: 10.1016/j.jbi.2018.11.006. Epub 2018 Nov 13.

miRiaD: A Text Mining Tool for Detecting Associations of microRNAs with Diseases.miRiaD：一种用于检测微小RNA与疾病关联的文本挖掘工具。

J Biomed Semantics. 2016 Apr 29;7(1):9. doi: 10.1186/s13326-015-0044-y.

A knowledge-driven conditional approach to extract pharmacogenomics specific drug-gene relationships from free text.基于知识的条件方法从自由文本中提取药物基因组学特定的药物-基因关系。

J Biomed Inform. 2012 Oct;45(5):827-34. doi: 10.1016/j.jbi.2012.04.011. Epub 2012 Apr 27.

deepBioWSD: effective deep neural word sense disambiguation of biomedical text data.深度生物词汇语义消歧：生物医学文本数据的有效深度神经网络词汇语义消歧。

J Am Med Inform Assoc. 2019 May 1;26(5):438-446. doi: 10.1093/jamia/ocy189.

引用本文的文献

Biclustering data analysis: a comprehensive survey.双聚类数据分析：全面综述。

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae342.

Overview of DrugProt task at BioCreative VII: data and methods for large-scale text mining and knowledge graph generation of heterogenous chemical-protein relations.DrugProt 任务概述在 BioCreative VII 上：大规模文本挖掘和异构化学-蛋白质关系知识图生成的数据和方法。

Database (Oxford). 2023 Nov 28;2023. doi: 10.1093/database/baad080.

A large-scale evaluation of NLP-derived chemical-gene/protein relationships from the scientific literature: Implications for knowledge graph construction.从科学文献中大规模评估 NLP 衍生的化学-基因/蛋白质关系：对知识图谱构建的影响。

PLoS One. 2023 Sep 8;18(9):e0291142. doi: 10.1371/journal.pone.0291142. eCollection 2023.

Deep Learning Approaches for lncRNA-Mediated Mechanisms: A Comprehensive Review of Recent Developments.深度学习方法在 lncRNA 介导的机制研究中的应用：最新进展的综合评述。

Int J Mol Sci. 2023 Jun 18;24(12):10299. doi: 10.3390/ijms241210299.

Contexts and contradictions: a roadmap for computational drug repurposing with knowledge inference.语境与矛盾：基于知识推理的计算药物再利用路线图。

Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac268.

Modeling drug response using network-based personalized treatment prediction (NetPTP) with applications to inflammatory bowel disease.使用基于网络的个性化治疗预测（NetPTP）进行药物反应建模及其在炎症性肠病中的应用。

PLoS Comput Biol. 2021 Feb 5;17(2):e1008631. doi: 10.1371/journal.pcbi.1008631. eCollection 2021 Feb.

Named Entity Recognition and Relation Detection for Biomedical Information Extraction.用于生物医学信息提取的命名实体识别与关系检测

Front Cell Dev Biol. 2020 Aug 28;8:673. doi: 10.3389/fcell.2020.00673. eCollection 2020.

PGxCorpus, a manually annotated corpus for pharmacogenomics.PGxCorpus，一个用于药物基因组学的人工标注语料库。

Sci Data. 2020 Jan 2;7(1):3. doi: 10.1038/s41597-019-0342-9.

Translational Knowledge Discovery Between Drug Interactions and Pharmacogenetics.药物相互作用与药物遗传学之间的转化知识发现。

Clin Pharmacol Ther. 2020 Apr;107(4):886-902. doi: 10.1002/cpt.1745. Epub 2020 Feb 3.

Pathway and network embedding methods for prioritizing psychiatric drugs.用于优先考虑精神药物的途径和网络嵌入方法。

Pac Symp Biocomput. 2020;25:671-682.

本文引用的文献

Hierarchical Clustering With Prototypes via Minimax Linkage.基于极大极小链接的带原型的层次聚类

J Am Stat Assoc. 2011;106(495):1075-1084. doi: 10.1198/jasa.2011.tm10183.

AMIA Annu Symp Proc. 2013 Nov 16;2013:1123-32. eCollection 2013.

Time to integrate clinical and research informatics.实现临床与研究信息学的整合。

Sci Transl Med. 2012 Nov 28;4(162):162fs41. doi: 10.1126/scitranslmed.3004583.

Pharmacogenomics knowledge for personalized medicine.药物基因组学知识与个性化医疗。

Clin Pharmacol Ther. 2012 Oct;92(4):414-7. doi: 10.1038/clpt.2012.96.

J Biomed Inform. 2012 Oct;45(5):827-34. doi: 10.1016/j.jbi.2012.04.011. Epub 2012 Apr 27.

The extraction of pharmacogenetic and pharmacogenomic relations--a case study using PharmGKB.药物遗传学和药物基因组学关系的提取——一项使用药物基因组学知识数据库（PharmGKB）的案例研究

Pac Symp Biocomput. 2012:376-87.

BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation.BioGraph：通过自动化假设生成进行无监督的生物医学知识发现。

Genome Biol. 2011 Jun 22;12(6):R57. doi: 10.1186/gb-2011-12-6-r57.

Using a shallow linguistic kernel for drug-drug interaction extraction.利用浅层语言核进行药物相互作用提取。

J Biomed Inform. 2011 Oct;44(5):789-804. doi: 10.1016/j.jbi.2011.04.005. Epub 2011 Apr 24.

PubMed and beyond: a survey of web tools for searching biomedical literature.PubMed 及其他：生物医学文献检索网络工具调查。

Database (Oxford). 2011 Jan 18;2011:baq036. doi: 10.1093/database/baq036. Print 2011.

Using text to build semantic networks for pharmacogenomics.利用文本构建药物基因组学的语义网络。

J Biomed Inform. 2010 Dec;43(6):1009-19. doi: 10.1016/j.jbi.2010.08.005. Epub 2010 Aug 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从非结构化文本中学习生物医学关系的结构

Learning the Structure of Biomedical Relationships from Unstructured Text.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献