生物医学文献的实体链接

Entity linking for biomedical literature.

作者信息

Zheng Jin G, Howsmon Daniel, Zhang Boliang, Hahn Juergen, McGuinness Deborah, Hendler James, Ji Heng

出版信息

BMC Med Inform Decis Mak. 2015;15 Suppl 1(Suppl 1):S4. doi: 10.1186/1472-6947-15-S1-S4. Epub 2015 May 20.

DOI:10.1186/1472-6947-15-S1-S4

PMID:26045232

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4460707/

Abstract

BACKGROUND

The Entity Linking (EL) task links entity mentions from an unstructured document to entities in a knowledge base. Although this problem is well-studied in news and social media, this problem has not received much attention in the life science domain. One outcome of tackling the EL problem in the life sciences domain is to enable scientists to build computational models of biological processes with more efficiency. However, simply applying a news-trained entity linker produces inadequate results.

METHODS

Since existing supervised approaches require a large amount of manually-labeled training data, which is currently unavailable for the life science domain, we propose a novel unsupervised collective inference approach to link entities from unstructured full texts of biomedical literature to 300 ontologies. The approach leverages the rich semantic information and structures in ontologies for similarity computation and entity ranking.

RESULTS

Without using any manual annotation, our approach significantly outperforms state-of-the-art supervised EL method (9% absolute gain in linking accuracy). Furthermore, the state-of-the-art supervised EL method requires 15,000 manually annotated entity mentions for training. These promising results establish a benchmark for the EL task in the life science domain. We also provide in depth analysis and discussion on both challenges and opportunities on automatic knowledge enrichment for scientific literature.

CONCLUSIONS

In this paper, we propose a novel unsupervised collective inference approach to address the EL problem in a new domain. We show that our unsupervised approach is able to outperform a current state-of-the-art supervised approach that has been trained with a large amount of manually labeled data. Life science presents an underrepresented domain for applying EL techniques. By providing a small benchmark data set and identifying opportunities, we hope to stimulate discussions across natural language processing and bioinformatics and motivate others to develop techniques for this largely untapped domain.

摘要

背景

实体链接（EL）任务将非结构化文档中的实体提及与知识库中的实体进行链接。尽管这个问题在新闻和社交媒体领域已经得到了充分研究，但在生命科学领域却没有受到太多关注。在生命科学领域解决EL问题的一个成果是使科学家能够更高效地构建生物过程的计算模型。然而，简单应用经过新闻训练的实体链接器会产生不尽人意的结果。

方法

由于现有的监督方法需要大量人工标注的训练数据，而目前生命科学领域无法获得这些数据，我们提出了一种新颖的无监督集体推理方法，将生物医学文献非结构化全文中的实体与300个本体进行链接。该方法利用本体中丰富的语义信息和结构进行相似度计算和实体排序。

结果

在不使用任何人工标注的情况下，我们的方法显著优于当前最先进的监督EL方法（链接准确率绝对提高9%）。此外，当前最先进的监督EL方法需要15000个经过人工标注的实体提及用于训练。这些令人鼓舞的结果为生命科学领域的EL任务建立了一个基准。我们还对科学文献自动知识丰富的挑战和机遇进行了深入分析和讨论。

结论

在本文中，我们提出了一种新颖的无监督集体推理方法来解决新领域中的EL问题。我们表明，我们的无监督方法能够优于当前最先进的、使用大量人工标注数据训练的监督方法。生命科学是应用EL技术的一个代表性不足的领域。通过提供一个小型基准数据集并识别机遇，我们希望激发自然语言处理和生物信息学领域的讨论，并激励其他人开发适用于这个基本上未开发领域的技术。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c6c/4460707/b0330a05d364/1472-6947-15-S1-S4-1.jpg

相似文献

Entity linking for biomedical literature.生物医学文献的实体链接

BMC Med Inform Decis Mak. 2015;15 Suppl 1(Suppl 1):S4. doi: 10.1186/1472-6947-15-S1-S4. Epub 2015 May 20.

Linking entities through an ontology using word embeddings and syntactic re-ranking.通过使用词向量和句法重新排序将实体链接到本体中。

BMC Bioinformatics. 2019 Mar 27;20(1):156. doi: 10.1186/s12859-019-2678-8.

PPR-SSM: personalized PageRank and semantic similarity measures for entity linking.PPR-SSM：用于实体链接的个性化 PageRank 和语义相似性度量。

BMC Bioinformatics. 2019 Oct 29;20(1):534. doi: 10.1186/s12859-019-3157-y.

Knowledge based word-concept model estimation and refinement for biomedical text mining.用于生物医学文本挖掘的基于知识的词概念模型估计与优化。

J Biomed Inform. 2015 Feb;53:300-7. doi: 10.1016/j.jbi.2014.11.015. Epub 2014 Dec 12.

Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems.自由实体抽取：细粒度实体分类系统的快速构建。

Big Data. 2017 Mar;5(1):19-31. doi: 10.1089/big.2017.0012.

A novel framework for biomedical entity sense induction.一种用于生物医学实体感知归纳的新框架。

J Biomed Inform. 2018 Aug;84:31-41. doi: 10.1016/j.jbi.2018.06.007. Epub 2018 Jun 20.

Enhancing unsupervised medical entity linking with multi-instance learning.利用多实例学习增强无监督医学实体链接。

BMC Med Inform Decis Mak. 2021 Nov 16;21(Suppl 9):317. doi: 10.1186/s12911-021-01654-z.

A categorical analysis of coreference resolution errors in biomedical texts.生物医学文本中指代消解错误的分类分析。

J Biomed Inform. 2016 Apr;60:309-18. doi: 10.1016/j.jbi.2016.02.015. Epub 2016 Feb 27.

Unsupervised Medical Entity Recognition and Linking in Chinese Online Medical Text.中文在线医疗文本中的无监督医学实体识别与链接

J Healthc Eng. 2018 Apr 18;2018:2548537. doi: 10.1155/2018/2548537. eCollection 2018.

The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities.共指消解对细菌与生物栖息地实体之间监督关系检测的贡献。

BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S6. doi: 10.1186/1471-2105-16-S10-S6. Epub 2015 Jul 13.

引用本文的文献

Artificial Intelligence in Biomedical Sciences: A Scoping Review.生物医学科学中的人工智能：一项范围综述

Br J Biomed Sci. 2025 Aug 5;82:14362. doi: 10.3389/bjbs.2025.14362. eCollection 2025.

An analysis of the effects of sharing research data, code, and preprints on citations.对分享研究数据、代码和预印本对引文影响的分析。

PLoS One. 2024 Oct 30;19(10):e0311493. doi: 10.1371/journal.pone.0311493. eCollection 2024.

Towards more patient friendly clinical notes through language models and ontologies.通过语言模型和本体论实现更便于患者理解的临床记录。

AMIA Annu Symp Proc. 2022 Feb 21;2021:881-890. eCollection 2021.

Linking chemical and disease entities to ontologies by integrating PageRank with extracted relations from literature.通过将PageRank与从文献中提取的关系相结合，将化学和疾病实体与本体进行关联。

J Cheminform. 2020 Sep 21;12(1):57. doi: 10.1186/s13321-020-00461-4.

BioConceptVec: Creating and evaluating literature-based biomedical concept embeddings on a large scale.生物概念向量：在大规模上创建和评估基于文献的生物医学概念嵌入。

PLoS Comput Biol. 2020 Apr 23;16(4):e1007617. doi: 10.1371/journal.pcbi.1007617. eCollection 2020 Apr.

The citation advantage of linking publications to research data.将出版物与研究数据关联的引文优势。

PLoS One. 2020 Apr 22;15(4):e0230416. doi: 10.1371/journal.pone.0230416. eCollection 2020.

Learning unsupervised contextual representations for medical synonym discovery.学习用于医学同义词发现的无监督上下文表示。

JAMIA Open. 2019 Nov 4;2(4):538-546. doi: 10.1093/jamiaopen/ooz057. eCollection 2019 Dec.

PPR-SSM: personalized PageRank and semantic similarity measures for entity linking.PPR-SSM：用于实体链接的个性化 PageRank 和语义相似性度量。

BMC Bioinformatics. 2019 Oct 29;20(1):534. doi: 10.1186/s12859-019-3157-y.

Knowledge-based biomedical Data Science.基于知识的生物医学数据科学

EPJ Data Sci. 2017;1(1-2):19-25. doi: 10.3233/DS-170001. Epub 2017 Dec 8.

Unsupervised Medical Entity Recognition and Linking in Chinese Online Medical Text.中文在线医疗文本中的无监督医学实体识别与链接

J Healthc Eng. 2018 Apr 18;2018:2548537. doi: 10.1155/2018/2548537. eCollection 2018.

本文引用的文献

NetiNeti: discovery of scientific names from text using machine learning methods.内提内提：使用机器学习方法从文本中发现科学名称。

BMC Bioinformatics. 2012 Aug 22;13:211. doi: 10.1186/1471-2105-13-211.

Multistage gene normalization and SVM-based ranking for protein interactor extraction in full-text articles.多阶段基因标准化和基于 SVM 的排序在全文文章中提取蛋白质互作。

IEEE/ACM Trans Comput Biol Bioinform. 2010 Jul-Sep;7(3):412-20. doi: 10.1109/TCBB.2010.45.

LitInspector: literature and signal transduction pathway mining in PubMed abstracts.LitInspector：在PubMed摘要中进行文献与信号转导通路挖掘

Nucleic Acids Res. 2009 Jul;37(Web Server issue):W135-40. doi: 10.1093/nar/gkp303. Epub 2009 May 5.

Biomedical language processing: what's beyond PubMed?生物医学语言处理：超越PubMed的是什么？

Mol Cell. 2006 Mar 3;21(5):589-94. doi: 10.1016/j.molcel.2006.02.012.

Overview of BioCreAtIvE task 1B: normalized gene lists.生物创意任务1B概述：标准化基因列表。

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S11. doi: 10.1186/1471-2105-6-S1-S11. Epub 2005 May 24.

Mathematical model of NF-kappaB regulatory module.核因子κB调控模块的数学模型

J Theor Biol. 2004 May 21;228(2):195-215. doi: 10.1016/j.jtbi.2004.01.001.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

生物医学文献的实体链接

Entity linking for biomedical literature.

作者信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献