Suppr
超能文献

GRAM-CNN：一种基于局部上下文的深度学习方法，用于生物医学文本中的命名实体识别。

GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text.

机构信息

National Science Foundation Center for Big Learning, University of Florida, Gainesville, FL 32611, USA.

Department of Computer & Information Science & Engineering, University of Florida, Gainesville, FL 32611, USA.

出版信息

Bioinformatics. 2018 May 1;34(9):1547-1554. doi: 10.1093/bioinformatics/btx815.

DOI:10.1093/bioinformatics/btx815

PMID:29272325

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5925775/

Abstract

MOTIVATION

Best performing named entity recognition (NER) methods for biomedical literature are based on hand-crafted features or task-specific rules, which are costly to produce and difficult to generalize to other corpora. End-to-end neural networks achieve state-of-the-art performance without hand-crafted features and task-specific knowledge in non-biomedical NER tasks. However, in the biomedical domain, using the same architecture does not yield competitive performance compared with conventional machine learning models.

RESULTS

We propose a novel end-to-end deep learning approach for biomedical NER tasks that leverages the local contexts based on n-gram character and word embeddings via Convolutional Neural Network (CNN). We call this approach GRAM-CNN. To automatically label a word, this method uses the local information around a word. Therefore, the GRAM-CNN method does not require any specific knowledge or feature engineering and can be theoretically applied to a wide range of existing NER problems. The GRAM-CNN approach was evaluated on three well-known biomedical datasets containing different BioNER entities. It obtained an F1-score of 87.26% on the Biocreative II dataset, 87.26% on the NCBI dataset and 72.57% on the JNLPBA dataset. Those results put GRAM-CNN in the lead of the biological NER methods. To the best of our knowledge, we are the first to apply CNN based structures to BioNER problems.

AVAILABILITY AND IMPLEMENTATION

The GRAM-CNN source code, datasets and pre-trained model are available online at: https://github.com/valdersoul/GRAM-CNN.

CONTACT

andyli@ece.ufl.edu or aconesa@ufl.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

生物医学文献中表现最佳的命名实体识别（NER）方法基于手工制作的特征或特定于任务的规则，这些特征和规则的制作成本很高，并且难以推广到其他语料库。端到端神经网络在非生物医学 NER 任务中无需手工制作的特征和特定于任务的知识即可实现最先进的性能。然而，在生物医学领域，使用相同的架构与传统的机器学习模型相比，性能并不具有竞争力。

结果

我们提出了一种新颖的端到端深度学习方法，用于生物医学 NER 任务，该方法利用基于 n-gram 字符和单词嵌入的局部上下文通过卷积神经网络（CNN）。我们称这种方法为 GRAM-CNN。为了自动标记一个单词，该方法使用单词周围的局部信息。因此，GRAM-CNN 方法不需要任何特定的知识或特征工程，并且可以在理论上应用于广泛的现有 NER 问题。GRAM-CNN 方法在包含不同 BioNER 实体的三个著名生物医学数据集上进行了评估。它在 Biocreative II 数据集上获得了 87.26%的 F1 分数，在 NCBI 数据集上获得了 87.26%的 F1 分数，在 JNLPBA 数据集上获得了 72.57%的 F1 分数。这些结果使 GRAM-CNN 在生物 NER 方法中处于领先地位。据我们所知，我们是第一个将基于 CNN 的结构应用于 BioNER 问题的人。

可用性和实现

GRAM-CNN 的源代码、数据集和预训练模型可在以下网址获得：https://github.com/valdersoul/GRAM-CNN。

联系方式

andyli@ece.ufl.edu 或 aconesa@ufl.edu。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f66/5925775/3a64c49d4c95/btx815f1.jpg

相似文献

GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text.

Bioinformatics. 2018 May 1;34(9):1547-1554. doi: 10.1093/bioinformatics/btx815.

Cross-type biomedical named entity recognition with deep multi-task learning.

Bioinformatics. 2019 May 15;35(10):1745-1752. doi: 10.1093/bioinformatics/bty869.

Biomedical named entity recognition using deep neural networks with contextual information.

BMC Bioinformatics. 2019 Dec 27;20(1):735. doi: 10.1186/s12859-019-3321-4.

DTranNER: biomedical named entity recognition with deep learning-based label-label transition model.

BMC Bioinformatics. 2020 Feb 11;21(1):53. doi: 10.1186/s12859-020-3393-1.

Towards reliable named entity recognition in the biomedical domain.

Bioinformatics. 2020 Jan 1;36(1):280-286. doi: 10.1093/bioinformatics/btz504.

Dataset-aware multi-task learning approaches for biomedical named entity recognition.

Bioinformatics. 2020 Aug 1;36(15):4331-4338. doi: 10.1093/bioinformatics/btaa515.

Long short-term memory RNN for biomedical named entity recognition.

BMC Bioinformatics. 2017 Oct 30;18(1):462. doi: 10.1186/s12859-017-1868-5.

Transfer learning for biomedical named entity recognition with neural networks.

Bioinformatics. 2018 Dec 1;34(23):4087-4094. doi: 10.1093/bioinformatics/bty449.

Augmenting biomedical named entity recognition with general-domain resources.

J Biomed Inform. 2024 Nov;159:104731. doi: 10.1016/j.jbi.2024.104731. Epub 2024 Oct 4.

D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information.

Bioinformatics. 2018 Oct 15;34(20):3539-3546. doi: 10.1093/bioinformatics/bty356.

引用本文的文献

SciLinker: a large-scale text mining framework for mapping associations among biological entities.

Front Artif Intell. 2025 Mar 19;8:1528562. doi: 10.3389/frai.2025.1528562. eCollection 2025.

Discovery of diverse and high-quality mRNA capping enzymes through a language model-enabled platform.

Sci Adv. 2025 Apr 11;11(15):eadt0402. doi: 10.1126/sciadv.adt0402. Epub 2025 Apr 9.

Few-shot biomedical NER empowered by LLMs-assisted data augmentation and multi-scale feature extraction.

BioData Min. 2025 Apr 4;18(1):28. doi: 10.1186/s13040-025-00443-y.

Automated Information Extraction from Thyroid Operation Narrative: A Comparative Study of GPT-4 and Fine-tuned KoELECTRA.

AMIA Jt Summits Transl Sci Proc. 2024 May 31;2024:249-257. eCollection 2024.

Ensemble pretrained language models to extract biomedical knowledge from literature.

J Am Med Inform Assoc. 2024 Sep 1;31(9):1904-1911. doi: 10.1093/jamia/ocae061.

Generating actionable insights from free-text care experience survey data using qualitative and computational text analysis: A study protocol.

HRB Open Res. 2022 Sep 12;5:60. doi: 10.12688/hrbopenres.13606.1. eCollection 2022.

Using transfer learning-based causality extraction to mine latent factors for Sjögren's syndrome from biomedical literature.

Heliyon. 2023 Aug 22;9(9):e19265. doi: 10.1016/j.heliyon.2023.e19265. eCollection 2023 Sep.

Using mechanistic models and machine learning to design single-color multiplexed nascent chain tracking experiments.

Front Cell Dev Biol. 2023 May 30;11:1151318. doi: 10.3389/fcell.2023.1151318. eCollection 2023.

Review: A Roadmap to Use Nonstructured Data to Discover Multitarget Cancer Therapies.

JCO Clin Cancer Inform. 2023 Apr;7:e2200096. doi: 10.1200/CCI.22.00096.

Using Mechanistic Models and Machine Learning to Design Single-Color Multiplexed Nascent Chain Tracking Experiments.

bioRxiv. 2023 Jan 26:2023.01.25.525583. doi: 10.1101/2023.01.25.525583.

本文引用的文献

TaggerOne: joint named entity recognition and normalization with semi-Markov Models.

Bioinformatics. 2016 Sep 15;32(18):2839-46. doi: 10.1093/bioinformatics/btw343. Epub 2016 Jun 9.

Deep learning.

Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.

NCBI disease corpus: a resource for disease name recognition and concept normalization.

J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3.

DNorm: disease name normalization with pairwise learning to rank.

Bioinformatics. 2013 Nov 15;29(22):2909-17. doi: 10.1093/bioinformatics/btt474. Epub 2013 Aug 21.

Gimli: open source and high-performance biomedical name recognition.

BMC Bioinformatics. 2013 Feb 15;14:54. doi: 10.1186/1471-2105-14-54.

Overview of BioCreative II gene mention recognition.

Genome Biol. 2008;9 Suppl 2(Suppl 2):S2. doi: 10.1186/gb-2008-9-s2-s2. Epub 2008 Sep 1.

Integrating high dimensional bi-directional parsing models for gene mention tagging.

Bioinformatics. 2008 Jul 1;24(13):i286-94. doi: 10.1093/bioinformatics/btn183.

BANNER: an executable survey of advances in biomedical named entity recognition.

Pac Symp Biocomput. 2008:652-63.

NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition.

BMC Bioinformatics. 2006 Dec 18;7 Suppl 5(Suppl 5):S11. doi: 10.1186/1471-2105-7-S5-S11.

ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text.

Bioinformatics. 2005 Jul 15;21(14):3191-2. doi: 10.1093/bioinformatics/bti475. Epub 2005 Apr 28.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

GRAM-CNN：一种基于局部上下文的深度学习方法，用于生物医学文本中的命名实体识别。

GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译