Suppr超能文献

ADPG:基于自动依存句法分析图的生物医学实体识别

ADPG: Biomedical entity recognition based on Automatic Dependency Parsing Graph.

作者信息

Yang Yumeng, Lin Hongfei, Yang Zhihao, Zhang Yijia, Zhao Di, Huai Shuaiheng

机构信息

School of Computer Science and Technology, Dalian University of Technology, Dalian, China.

School of Information Science and Technology, Dalian Maritime University, Dalian, China.

出版信息

J Biomed Inform. 2023 Apr;140:104317. doi: 10.1016/j.jbi.2023.104317. Epub 2023 Feb 17.

Abstract

Named entity recognition is a key task in text mining. In the biomedical field, entity recognition focuses on extracting key information from large-scale biomedical texts for the downstream information extraction task. Biomedical literature contains a large amount of long-dependent text, and previous studies use external syntactic parsing tools to capture word dependencies in sentences to achieve nested biomedical entity recognition. However, the addition of external parsing tools often introduces unnecessary noise to the current auxiliary task and cannot improve the performance of entity recognition in an end-to-end way. Therefore, we propose a novel automatic dependency parsing approach, namely the ADPG model, to fuse syntactic structure information in an end-to-end way to recognize biomedical entities. Specifically, the method is based on a multilayer Tree-Transformer structure to automatically extract the semantic representation and syntactic structure in long-dependent sentences, and then combines a multilayer graph attention neural network (GAT) to extract the dependency paths between words in the syntactic structure to improve the performance of biomedical entity recognition. We evaluated our ADPG model on three biomedical domain and one news domain datasets, and the experimental results demonstrate that our model achieves state-of-the-art results on these four datasets with certain generalization performance. Our model is released on GitHub: https://github.com/Yumeng-Y/ADPG.

摘要

命名实体识别是文本挖掘中的一项关键任务。在生物医学领域,实体识别专注于从大规模生物医学文本中提取关键信息,以用于下游的信息提取任务。生物医学文献包含大量具有长距离依存关系的文本,先前的研究使用外部句法分析工具来捕捉句子中的词依存关系,以实现嵌套生物医学实体识别。然而,添加外部分析工具往往会给当前的辅助任务引入不必要的噪声,并且无法以端到端的方式提高实体识别的性能。因此,我们提出了一种新颖的自动依存句法分析方法,即ADPG模型,以端到端的方式融合句法结构信息来识别生物医学实体。具体而言,该方法基于多层树状变换器结构自动提取具有长距离依存关系句子中的语义表示和句法结构,然后结合多层图注意力神经网络(GAT)来提取句法结构中词之间的依存路径,以提高生物医学实体识别的性能。我们在三个生物医学领域和一个新闻领域的数据集上对我们的ADPG模型进行了评估,实验结果表明,我们的模型在这四个数据集上取得了最优结果,并具有一定的泛化性能。我们的模型已在GitHub上发布:https://github.com/Yumeng-Y/ADPG。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验