Shi Jintong, Sun Mengxuan, Sun Zhengya, Li Mingda, Gu Yifan, Zhang Wensheng
University of Chinese Academy of Sciences, Beijing, 100049, China; Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China.
University of Chinese Academy of Sciences, Beijing, 100049, China; Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China.
J Biomed Inform. 2022 Sep;133:104144. doi: 10.1016/j.jbi.2022.104144. Epub 2022 Jul 22.
Medical named entity recognition (MNER) is a fundamental component of understanding the unstructured medical texts in electronic health records, and it has received widespread attention in both academia and industry. However, the previous approaches of MNER do not make full use of hierarchical semantics from morphology to syntactic relationships like word dependency. Furthermore, extracting entities from Chinese medical texts is a more complex task because it usually contains for example homophones or pictophonetic characters. In this paper, we propose a multi-level semantic fusion network for Chinese medical named entity recognition, which fuses semantic information on morphology, character, word and syntactic level. We take radical as morphology semantic, pinyin and character dictionary as character semantic, word dictionary as word semantic, and these semantic features are fused by BiLSTM to get the contextualized representation. Then we use a graph neural network to model word dependency as syntactic semantic to enhance the contextualized representation. The experimental results show the effectiveness of the proposed model on two public datasets and robustness in real-world scenarios.
医学命名实体识别(MNER)是理解电子健康记录中非结构化医学文本的一个基本组成部分,并且在学术界和工业界都受到了广泛关注。然而,先前的MNER方法没有充分利用从形态学到句法关系(如词依存关系)的层次语义。此外,从中文医学文本中提取实体是一项更复杂的任务,因为它通常包含例如同音词或形声字。在本文中,我们提出了一种用于中文医学命名实体识别的多层次语义融合网络,该网络融合了形态、字符、词和句法层面的语义信息。我们将部首作为形态语义,拼音和字符字典作为字符语义,词字典作为词语义,并且这些语义特征通过双向长短期记忆网络(BiLSTM)进行融合以获得上下文表示。然后,我们使用图神经网络将词依存关系建模为句法语义以增强上下文表示。实验结果表明了所提出模型在两个公共数据集上的有效性以及在实际场景中的稳健性。