Suppr超能文献

CLART:一种用于中文医学命名实体识别的级联格与激进变压器网络。

CLART: A cascaded lattice-and-radical transformer network for Chinese medical named entity recognition.

作者信息

Xiao Yinlong, Ji Zongcheng, Li Jianqiang, Zhu Qing

机构信息

Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China.

PAII Inc., CA 94087, United States of America.

出版信息

Heliyon. 2023 Oct 10;9(10):e20692. doi: 10.1016/j.heliyon.2023.e20692. eCollection 2023 Oct.

Abstract

Chinese medical named entity recognition (NER) is a fundamental task in Chinese medical natural language processing, aiming to recognize Chinese medical entities within unstructured medical texts. However, it poses significant challenges mainly due to the extensive usage of medical terms in Chinese medical texts. Although previous studies have made attempts to incorporate lexical or radical knowledge in order to improve the comprehension of medical texts, these studies either focus solely on one of these aspects or utilize a basic concatenation operation to combine these features, which fails to fully utilize the potential of lexical and radical knowledge. In this paper, we propose a novel Cascaded LAttice-and-Radical Transformer (CLART) network to exploit both lexical and radical information for Chinese medical NER. Specifically, given a sentence, a medical lexicon, and a radical dictionary, we first construct a flat lattice (, character-word sequence) for the sentence and radical components of each Chinese character through word matching and radical parsing, respectively. We then employ a lattice Transformer module to capture the dense interactions between characters and matched words, facilitating the enhanced utilization of lexical knowledge. Subsequently, we design a radical Transformer module to model the dense interactions between the lattice and radical features, facilitating better fusion of the lexical and radical knowledge. Finally, we feed the updated lattice-and-radical-aware character representations into a Conditional Random Fields (CRF) decoder to obtain the predicted labels. Experimental results conducted on two publicly available Chinese medical NER datasets show the effectiveness of the proposed method.

摘要

中文医学命名实体识别(NER)是中文医学自然语言处理中的一项基础任务,旨在识别非结构化医学文本中的中文医学实体。然而,它带来了重大挑战,主要原因是中文医学文本中医学术语的广泛使用。尽管先前的研究已尝试纳入词汇或部首知识以提高对医学文本的理解,但这些研究要么仅关注其中一个方面,要么使用基本的拼接操作来组合这些特征,这未能充分利用词汇和部首知识的潜力。在本文中,我们提出了一种新颖的级联格与部首变换器(CLART)网络,以利用词汇和部首信息进行中文医学NER。具体而言,给定一个句子、一个医学词汇表和一个部首词典,我们首先分别通过词匹配和部首解析为句子和每个汉字的部首成分构建一个扁平格(字符-词序列)。然后,我们使用格变换器模块来捕捉字符与匹配词之间的密集交互,促进词汇知识的增强利用。随后,我们设计一个部首变换器模块来对格与部首特征之间的密集交互进行建模,促进词汇和部首知识的更好融合。最后,我们将更新后的格与部首感知字符表示输入到条件随机场(CRF)解码器中以获得预测标签。在两个公开可用的中文医学NER数据集上进行的实验结果表明了所提方法的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/875f/10590790/c801dde02ef2/gr001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验