• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于RoBERTa和单模块全局指针的中医实体与关系联合提取

Joint extraction of Chinese medical entities and relations based on RoBERTa and single-module global pointer.

作者信息

Li Dongmei, Yang Yu, Cui Jinman, Meng Xianghao, Qu Jintao, Jiang Zhuobin, Zhao Yufeng

机构信息

School of Information Science and Technology, Beijing Forestry University, 100083, Beijing, China.

Engineering Research Center for Forestry-Oriented Intelligent Information Processing of National Forestry and Grassland Administration, 100083, Beijing, China.

出版信息

BMC Med Inform Decis Mak. 2024 Jul 31;24(1):218. doi: 10.1186/s12911-024-02577-1.

DOI:10.1186/s12911-024-02577-1
PMID:39085892
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11293210/
Abstract

BACKGROUND

Most Chinese joint entity and relation extraction tasks in medicine involve numerous nested entities, overlapping relations, and other challenging extraction issues. In response to these problems, some traditional methods decompose the joint extraction task into multiple steps or multiple modules, resulting in local dependency in the meantime.

METHODS

To alleviate this issue, we propose a joint extraction model of Chinese medical entities and relations based on RoBERTa and single-module global pointer, namely RSGP, which formulates joint extraction as a global pointer linking problem. Considering the uniqueness of Chinese language structure, we introduce the RoBERTa-wwm pre-trained language model at the encoding layer to obtain a better embedding representation. Then, we represent the input sentence as a third-order tensor and score each position in the tensor to prepare for the subsequent process of decoding the triples. In the end, we design a novel single-module global pointer decoding approach to alleviate the generation of redundant information. Specifically, we analyze the decoding process of single character entities individually, improving the time and space performance of RSGP to some extent.

RESULTS

In order to verify the effectiveness of our model in extracting Chinese medical entities and relations, we carry out the experiments on the public dataset, CMeIE. Experimental results show that RSGP performs significantly better on the joint extraction of Chinese medical entities and relations, and achieves state-of-the-art results compared with baseline models.

CONCLUSION

The proposed RSGP can effectively extract entities and relations from Chinese medical texts and help to realize the structure of Chinese medical texts, so as to provide high-quality data support for the construction of Chinese medical knowledge graphs.

摘要

背景

大多数中文医学领域的联合实体与关系抽取任务涉及大量嵌套实体、重叠关系以及其他具有挑战性的抽取问题。针对这些问题,一些传统方法将联合抽取任务分解为多个步骤或多个模块,同时导致了局部依赖性。

方法

为缓解这一问题,我们提出了一种基于RoBERTa和单模块全局指针的中文医学实体与关系联合抽取模型,即RSGP,它将联合抽取表述为一个全局指针链接问题。考虑到中文语言结构的独特性,我们在编码层引入RoBERTa-wwm预训练语言模型以获得更好的嵌入表示。然后,我们将输入句子表示为三阶张量并对张量中的每个位置进行评分,为后续三元组解码过程做准备。最后,我们设计了一种新颖的单模块全局指针解码方法来缓解冗余信息的产生。具体而言,我们分别分析单字符实体的解码过程,在一定程度上提高了RSGP的时间和空间性能。

结果

为了验证我们的模型在抽取中文医学实体与关系方面的有效性,我们在公共数据集CMeIE上进行了实验。实验结果表明,RSGP在中文医学实体与关系的联合抽取上表现显著更好,与基线模型相比取得了最优结果。

结论

所提出的RSGP能够有效地从中文医学文本中抽取实体与关系,有助于实现中文医学文本的结构化,从而为中文医学知识图谱的构建提供高质量的数据支持。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16d/11293210/15bbad931b60/12911_2024_2577_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16d/11293210/ad421a8d5de6/12911_2024_2577_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16d/11293210/30c184f5dacc/12911_2024_2577_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16d/11293210/76ffcaebedb3/12911_2024_2577_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16d/11293210/2dabb6057895/12911_2024_2577_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16d/11293210/15bbad931b60/12911_2024_2577_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16d/11293210/ad421a8d5de6/12911_2024_2577_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16d/11293210/30c184f5dacc/12911_2024_2577_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16d/11293210/76ffcaebedb3/12911_2024_2577_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16d/11293210/2dabb6057895/12911_2024_2577_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b16d/11293210/15bbad931b60/12911_2024_2577_Fig5_HTML.jpg

相似文献

1
Joint extraction of Chinese medical entities and relations based on RoBERTa and single-module global pointer.基于RoBERTa和单模块全局指针的中医实体与关系联合提取
BMC Med Inform Decis Mak. 2024 Jul 31;24(1):218. doi: 10.1186/s12911-024-02577-1.
2
Research on a Joint Extraction Method of Track Circuit Entities and Relations Integrating Global Pointer and Tensor Learning.一种融合全局指针与张量学习的轨道电路实体与关系联合抽取方法的研究
Sensors (Basel). 2024 Nov 6;24(22):7128. doi: 10.3390/s24227128.
3
Application of cascade binary pointer tagging in joint entity and relation extraction of Chinese medical text.级联二值指针标注在中文医学文本联合实体和关系抽取中的应用。
Math Biosci Eng. 2022 Jul 27;19(10):10656-10672. doi: 10.3934/mbe.2022498.
4
BAMRE: Joint extraction model of Chinese medical entities and relations based on Biaffine transformation with relation attention.基于关系注意力的双线性变换的中文医疗实体和关系联合抽取模型。
J Biomed Inform. 2024 Oct;158:104733. doi: 10.1016/j.jbi.2024.104733. Epub 2024 Oct 3.
5
Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study.基于RoBERTa-WWM-ext + CNN(带有全词掩码扩展的基于变换器预训练方法的稳健优化双向编码器表示与卷积神经网络相结合)模型的医患对话多标签分类:命名实体研究
JMIR Med Inform. 2022 Apr 21;10(4):e35606. doi: 10.2196/35606.
6
Chinese Clinical Named Entity Recognition With Segmentation Synonym Sentence Synthesis Mechanism: Algorithm Development and Validation.基于分词、同义词和句子合成机制的中文临床命名实体识别:算法开发与验证
JMIR Med Inform. 2024 Nov 21;12:e60334. doi: 10.2196/60334.
7
A joint entity Relation Extraction method for document level Traditional Chinese Medicine texts.一种面向文档级中文医疗文本的联合实体关系抽取方法。
Artif Intell Med. 2024 Aug;154:102915. doi: 10.1016/j.artmed.2024.102915. Epub 2024 Jun 19.
8
BioEGRE: a linguistic topology enhanced method for biomedical relation extraction based on BioELECTRA and graph pointer neural network.BioEGRE:一种基于 BioELECTRA 和图指针神经网络的生物医学关系抽取的语言拓扑增强方法。
BMC Bioinformatics. 2023 Dec 19;24(1):486. doi: 10.1186/s12859-023-05601-9.
9
RTJTN: Relational Triplet Joint Tagging Network for Joint Entity and Relation Extraction.RTJTN:关系三元组联合标注网络,用于联合实体和关系抽取。
Comput Intell Neurosci. 2021 Oct 16;2021:3447473. doi: 10.1155/2021/3447473. eCollection 2021.
10
Chinese Clinical Named Entity Recognition From Electronic Medical Records Based on Multisemantic Features by Using Robustly Optimized Bidirectional Encoder Representation From Transformers Pretraining Approach Whole Word Masking and Convolutional Neural Networks: Model Development and Validation.基于多语义特征,利用经过稳健优化的基于变换器预训练方法的全词掩码和卷积神经网络从电子病历中进行中文临床命名实体识别:模型开发与验证
JMIR Med Inform. 2023 May 10;11:e44597. doi: 10.2196/44597.

本文引用的文献

1
Taiyi: a bilingual fine-tuned large language model for diverse biomedical tasks.太乙:一个用于多种生物医学任务的双语精调大型语言模型。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1865-1874. doi: 10.1093/jamia/ocae037.
2
A hybrid method based on semi-supervised learning for relation extraction in Chinese EMRs.基于半监督学习的中文电子病历关系抽取混合方法。
BMC Med Inform Decis Mak. 2022 Jun 27;22(1):169. doi: 10.1186/s12911-022-01908-4.
3
A tag based joint extraction model for Chinese medical text.基于标签的中文医学文本联合抽取模型。
Comput Biol Chem. 2021 Aug;93:107508. doi: 10.1016/j.compbiolchem.2021.107508. Epub 2021 May 18.
4
A span-graph neural model for overlapping entity relation extraction in biomedical texts.一种用于生物医学文献中重叠实体关系抽取的图神经网络模型。
Bioinformatics. 2021 Jul 12;37(11):1581-1589. doi: 10.1093/bioinformatics/btaa993.