结合文献挖掘和机器学习预测生物医学发现。

Combining Literature Mining and Machine Learning for Predicting Biomedical Discoveries.

机构信息

DRDO-BU Center for Life Sciences, Bharathiar University Campus, Coimbatore, Tamilnadu, India.

Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, USA.

出版信息

Methods Mol Biol. 2022;2496:123-140. doi: 10.1007/978-1-0716-2305-3_7.

DOI:10.1007/978-1-0716-2305-3_7

PMID:35713862

Abstract

The major outcomes and insights of scientific research and clinical study end up in the form of publication or clinical record in an unstructured text format. Due to advancements in biomedical research, the growth of published literature is getting tremendous large in recent years. The scientists and clinical researchers are facing a big challenge to stay current with the knowledge and to extract hidden information from this sheer quantity of millions of published biomedical literature. The potential one-stop automated solution to this problem is biomedical literature mining. One of the long-standing goals in biology is to discover the disease-causing genes and their specific roles in personalized precision medicine and drug repurposing. However, the empirical approaches and clinical affirmation are expensive and time-consuming. In silico approach using text mining to identify the disease causing genes can contribute towards biomarker discovery. This chapter presents a protocol on combining literature mining and machine learning for predicting biomedical discoveries with a special emphasis on gene-disease relation based discovery. The protocol is presented as a literature based discovery (LBD) pipeline for gene-disease based discovery. The protocol includes our web based tools: (1) DNER (Disease Named Entity Recognizer) for disease entity recognition, (2) BCCNER (Bidirectional, Contextual clues Named Entity Tagger) for gene/protein entity recognition, (3) DisGeReExT (Disease-Gene Relation Extractor) for statistically validated results and visualization, and (4) a newly introduced deep learning based method for association discovery. Our proposed deep learning based method can be generalized and applied to other important biomedical discoveries focusing on entities such as drug/chemical, or miRNA.

摘要

科学研究和临床研究的主要结果和见解最终以未结构化文本格式的出版物或临床记录的形式呈现。由于生物医学研究的进步，近年来发表文献的数量呈指数级增长。科学家和临床研究人员面临着一个巨大的挑战，即如何跟上知识的步伐，并从这数以百万计的已发表的生物医学文献中提取隐藏信息。解决这个问题的潜在一站式自动化解决方案是生物医学文献挖掘。生物学的长期目标之一是发现致病基因及其在个性化精准医学和药物再利用中的特定作用。然而，经验方法和临床验证既昂贵又耗时。使用文本挖掘来识别致病基因的计算方法可以为生物标志物的发现做出贡献。本章介绍了一种结合文献挖掘和机器学习来预测生物医学发现的方案，特别强调了基于基因-疾病关系的发现。该方案作为基于文献的发现 (LBD) 管道呈现，用于基于基因-疾病的发现。该方案包括我们的基于网络的工具：(1) DNER（疾病命名实体识别器）用于疾病实体识别，(2) BCCNER（双向、上下文线索命名实体标记器）用于基因/蛋白质实体识别，(3) DisGeReExT（疾病-基因关系提取器）用于统计验证结果和可视化，以及 (4) 新引入的基于深度学习的关联发现方法。我们提出的基于深度学习的方法可以推广并应用于其他重要的生物医学发现，重点关注药物/化学物质或 miRNA 等实体。

相似文献

Combining Literature Mining and Machine Learning for Predicting Biomedical Discoveries.结合文献挖掘和机器学习预测生物医学发现。

Methods Mol Biol. 2022;2496:123-140. doi: 10.1007/978-1-0716-2305-3_7.

Biomedical named entity recognition using deep neural networks with contextual information.基于上下文信息的深度神经网络的生物医学命名实体识别。

BMC Bioinformatics. 2019 Dec 27;20(1):735. doi: 10.1186/s12859-019-3321-4.

Entity recognition in the biomedical domain using a hybrid approach.使用混合方法进行生物医学领域的实体识别。

J Biomed Semantics. 2017 Nov 9;8(1):51. doi: 10.1186/s13326-017-0157-6.

BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition.BCC-NER：用于基因/蛋白质提及识别的双向上下文线索命名实体标记器。

EURASIP J Bioinform Syst Biol. 2017 Dec;2017(1):7. doi: 10.1186/s13637-017-0060-6. Epub 2017 May 5.

TP-DDI: Transformer-based pipeline for the extraction of Drug-Drug Interactions.基于 Transformer 的药物相互作用提取流水线（TP-DDI）。

Artif Intell Med. 2021 Sep;119:102153. doi: 10.1016/j.artmed.2021.102153. Epub 2021 Aug 23.

DTranNER: biomedical named entity recognition with deep learning-based label-label transition model.DTranNER：基于深度学习的标签-标签转换模型的生物医学命名实体识别。

BMC Bioinformatics. 2020 Feb 11;21(1):53. doi: 10.1186/s12859-020-3393-1.

Deep learning with word embeddings improves biomedical named entity recognition.使用词嵌入的深度学习可改善生物医学命名实体识别。

Bioinformatics. 2017 Jul 15;33(14):i37-i48. doi: 10.1093/bioinformatics/btx228.

Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task.评估生物医学关系抽取的技术现状：生物创意V化学-疾病关系（CDR）任务概述。

Database (Oxford). 2016 Mar 19;2016. doi: 10.1093/database/baw032. Print 2016.

Literature mining, ontologies and information visualization for drug repurposing.文献挖掘、本体论和信息可视化在药物重定位中的应用。

Brief Bioinform. 2011 Jul;12(4):357-68. doi: 10.1093/bib/bbr005. Epub 2011 Jun 28.

Text Mining and Machine Learning Protocol for Extracting Human-Related Protein Phosphorylation Information from PubMed.从 PubMed 中提取与人相关的蛋白质磷酸化信息的文本挖掘和机器学习协议。

Methods Mol Biol. 2022;2496:159-177. doi: 10.1007/978-1-0716-2305-3_9.

引用本文的文献

Deep learning-based discovery of compounds for blood pressure lowering effects.基于深度学习发现具有降血压作用的化合物。

Sci Rep. 2025 Jan 2;15(1):54. doi: 10.1038/s41598-024-83924-0.

本文引用的文献

A systematic review on literature-based discovery workflow.基于文献的发现工作流程的系统综述。

PeerJ Comput Sci. 2019 Nov 18;5:e235. doi: 10.7717/peerj-cs.235. eCollection 2019.

Unsupervised and self-supervised deep learning approaches for biomedical text mining.无监督和自监督深度学习方法在生物医学文本挖掘中的应用。

Brief Bioinform. 2021 Mar 22;22(2):1592-1603. doi: 10.1093/bib/bbab016.

Literature based discovery of alternative TCM medicine for adverse reactions to depression drugs.基于文献的发现：用于治疗抗抑郁药物不良反应的替代中药。

BMC Bioinformatics. 2020 Oct 26;21(Suppl 5):405. doi: 10.1186/s12859-020-03735-8.

Neural networks for open and closed Literature-based Discovery.基于文献的开放式和封闭式发现的神经网络。

PLoS One. 2020 May 15;15(5):e0232891. doi: 10.1371/journal.pone.0232891. eCollection 2020.

Molecular Mechanism of T-2 Toxin-Induced Cerebral Edema by Aquaporin-4 Blocking and Permeation.T-2 毒素通过阻断和渗透水通道蛋白 4 引起脑水肿的分子机制。

J Chem Inf Model. 2019 Nov 25;59(11):4942-4958. doi: 10.1021/acs.jcim.9b00711. Epub 2019 Nov 5.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT：一种用于生物医学文本挖掘的预训练生物医学语言表示模型。

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

A context-based ABC model for literature-based discovery.基于上下文的文献发现 ABC 模型。

PLoS One. 2019 Apr 24;14(4):e0215313. doi: 10.1371/journal.pone.0215313. eCollection 2019.

A survey on literature based discovery approaches in biomedical domain.基于文献的生物医学领域发现方法研究综述。

J Biomed Inform. 2019 May;93:103141. doi: 10.1016/j.jbi.2019.103141. Epub 2019 Mar 9.

LION LBD: a literature-based discovery system for cancer biology.LION LBD：一个基于文献的癌症生物学发现系统。

Bioinformatics. 2019 May 1;35(9):1553-1561. doi: 10.1093/bioinformatics/bty845.

Best Match: New relevance search for PubMed.最佳匹配：PubMed 的新相关性搜索。

PLoS Biol. 2018 Aug 28;16(8):e2005343. doi: 10.1371/journal.pbio.2005343. eCollection 2018 Aug.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

结合文献挖掘和机器学习预测生物医学发现。

Combining Literature Mining and Machine Learning for Predicting Biomedical Discoveries.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献