Suppr超能文献

增强生物医学实体识别中的泛化能力:自注意力主成分分析分类模型

Enhancing Generalizability in Biomedical Entity Recognition: Self-Attention PCA-CLS Model.

作者信息

Mundotiya Rajesh Kumar, Priya Juhi, Kuwarbi Divya, Singh Teekam

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2024 Nov-Dec;21(6):1934-1941. doi: 10.1109/TCBB.2024.3429234. Epub 2024 Dec 10.

Abstract

One of the primary tasks in the early stages of data mining involves the identification of entities from biomedical corpora. Traditional approaches relying on robust feature engineering face challenges when learning from available (un-)annotated data using data-driven models like deep learning-based architectures. Despite leveraging large corpora and advanced deep learning models, domain generalization remains an issue. Attention mechanisms are effective in capturing longer sentence dependencies and extracting semantic and syntactic information from limited annotated datasets. To address out-of-vocabulary challenges in biomedical text, the PCA-CLS (Position and Contextual Attention with CNN-LSTM-Softmax) model combines global self-attention and character-level convolutional neural network techniques. The model's performance is evaluated on eight distinct biomedical domain datasets encompassing entities such as genes, drugs, diseases, and species. The PCA-CLS model outperforms several state-of-the-art models, achieving notable F-scores, including 88.19% on BC2GM, 85.44% on JNLPBA, 90.80% on BC5CDR-chemical, 87.07% on BC5CDR-disease, 89.18% on BC4CHEMD, 88.81% on NCBI, and 91.59% on the s800 dataset.

摘要

数据挖掘早期阶段的主要任务之一是从生物医学语料库中识别实体。当使用基于深度学习的架构等数据驱动模型从可用的(未)注释数据中学习时,依赖强大特征工程的传统方法面临挑战。尽管利用了大型语料库和先进的深度学习模型,但领域泛化仍然是一个问题。注意力机制在捕捉较长句子依赖性以及从有限的注释数据集中提取语义和句法信息方面很有效。为了解决生物医学文本中的词汇外挑战,PCA-CLS(基于CNN-LSTM-Softmax的位置和上下文注意力)模型结合了全局自注意力和字符级卷积神经网络技术。该模型在八个不同的生物医学领域数据集上进行了评估,这些数据集包含基因、药物、疾病和物种等实体。PCA-CLS模型优于几个当前最先进的模型,获得了显著的F分数,包括在BC2GM上为88.19%,在JNLPBA上为85.44%,在BC5CDR-chemical上为90.80%,在BC5CDR-disease上为87.07%,在BC4CHEMD上为89.18%,在NCBI上为88.81%,在s800数据集上为91.59%。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验