Suppr超能文献

ADPG:基于自动依存句法分析图的生物医学实体识别

ADPG: Biomedical entity recognition based on Automatic Dependency Parsing Graph.

作者信息

Yang Yumeng, Lin Hongfei, Yang Zhihao, Zhang Yijia, Zhao Di, Huai Shuaiheng

机构信息

School of Computer Science and Technology, Dalian University of Technology, Dalian, China.

School of Information Science and Technology, Dalian Maritime University, Dalian, China.

出版信息

J Biomed Inform. 2023 Apr;140:104317. doi: 10.1016/j.jbi.2023.104317. Epub 2023 Feb 17.

Abstract

Named entity recognition is a key task in text mining. In the biomedical field, entity recognition focuses on extracting key information from large-scale biomedical texts for the downstream information extraction task. Biomedical literature contains a large amount of long-dependent text, and previous studies use external syntactic parsing tools to capture word dependencies in sentences to achieve nested biomedical entity recognition. However, the addition of external parsing tools often introduces unnecessary noise to the current auxiliary task and cannot improve the performance of entity recognition in an end-to-end way. Therefore, we propose a novel automatic dependency parsing approach, namely the ADPG model, to fuse syntactic structure information in an end-to-end way to recognize biomedical entities. Specifically, the method is based on a multilayer Tree-Transformer structure to automatically extract the semantic representation and syntactic structure in long-dependent sentences, and then combines a multilayer graph attention neural network (GAT) to extract the dependency paths between words in the syntactic structure to improve the performance of biomedical entity recognition. We evaluated our ADPG model on three biomedical domain and one news domain datasets, and the experimental results demonstrate that our model achieves state-of-the-art results on these four datasets with certain generalization performance. Our model is released on GitHub: https://github.com/Yumeng-Y/ADPG.

摘要

命名实体识别是文本挖掘中的一项关键任务。在生物医学领域,实体识别专注于从大规模生物医学文本中提取关键信息,以用于下游的信息提取任务。生物医学文献包含大量具有长距离依存关系的文本,先前的研究使用外部句法分析工具来捕捉句子中的词依存关系,以实现嵌套生物医学实体识别。然而,添加外部分析工具往往会给当前的辅助任务引入不必要的噪声,并且无法以端到端的方式提高实体识别的性能。因此,我们提出了一种新颖的自动依存句法分析方法,即ADPG模型,以端到端的方式融合句法结构信息来识别生物医学实体。具体而言,该方法基于多层树状变换器结构自动提取具有长距离依存关系句子中的语义表示和句法结构,然后结合多层图注意力神经网络(GAT)来提取句法结构中词之间的依存路径,以提高生物医学实体识别的性能。我们在三个生物医学领域和一个新闻领域的数据集上对我们的ADPG模型进行了评估,实验结果表明,我们的模型在这四个数据集上取得了最优结果,并具有一定的泛化性能。我们的模型已在GitHub上发布:https://github.com/Yumeng-Y/ADPG。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验