State Key Laboratory of Intelligent Control and Decision of Complex Systems, School of Automation, Beijing Institute of Technology, Beijing, China.
Department of Cardiology, The Second Medical Center, National Clinical Research Center for Geriatric Diseases, Chinese PLA General Hospital, Beijing, China.
BMC Med Inform Decis Mak. 2022 Jul 30;22(1):201. doi: 10.1186/s12911-022-01924-4.
Named entity recognition (NER) is a key and fundamental part of many medical and clinical tasks, including the establishment of a medical knowledge graph, decision-making support, and question answering systems. When extracting entities from electronic health records (EHRs), NER models mostly apply long short-term memory (LSTM) and have surprising performance in clinical NER. However, increasing the depth of the network is often required by these LSTM-based models to capture long-distance dependencies. Therefore, these LSTM-based models that have achieved high accuracy generally require long training times and extensive training data, which has obstructed the adoption of LSTM-based models in clinical scenarios with limited training time.
Inspired by Transformer, we combine Transformer with Soft Term Position Lattice to form soft lattice structure Transformer, which models long-distance dependencies similarly to LSTM. Our model consists of four components: the WordPiece module, the BERT module, the soft lattice structure Transformer module, and the CRF module.
Our experiments demonstrated that this approach increased the F1 by 1-5% in the CCKS NER task compared to other models based on LSTM with CRF and consumed less training time. Additional evaluations showed that lattice structure transformer shows good performance for recognizing long medical terms, abbreviations, and numbers. The proposed model achieve 91.6% f-measure in recognizing long medical terms and 90.36% f-measure in abbreviations, and numbers.
By using soft lattice structure Transformer, the method proposed in this paper captured Chinese words to lattice information, making our model suitable for Chinese clinical medical records. Transformers with Mutilayer soft lattice Chinese word construction can capture potential interactions between Chinese characters and words.
命名实体识别(NER)是许多医学和临床任务的关键和基础部分,包括建立医学知识图谱、决策支持和问答系统。在从电子健康记录(EHR)中提取实体时,NER 模型大多应用长短期记忆(LSTM),并在临床 NER 中具有惊人的性能。然而,这些基于 LSTM 的模型通常需要增加网络的深度来捕获长距离依赖关系。因此,这些基于 LSTM 的模型虽然取得了很高的准确率,但通常需要较长的训练时间和大量的训练数据,这阻碍了基于 LSTM 的模型在训练时间有限的临床环境中的采用。
受 Transformer 的启发,我们将 Transformer 与 Soft Term Position Lattice 结合,形成类似于 LSTM 的软格结构 Transformer,用于建模长距离依赖关系。我们的模型由四个组件组成:WordPiece 模块、BERT 模块、软格结构 Transformer 模块和 CRF 模块。
我们的实验表明,与基于 LSTM 与 CRF 的其他模型相比,该方法在 CCKS NER 任务中的 F1 值提高了 1-5%,并且训练时间更短。此外的评估表明,格结构转换器在识别长医疗术语、缩写和数字方面表现良好。所提出的模型在识别长医疗术语方面的 F1 得分为 91.6%,在识别缩写和数字方面的 F1 得分为 90.36%。
通过使用软格结构 Transformer,本文提出的方法将中文单词映射到格信息中,使我们的模型适用于中文临床病历。具有多层软格中文词构建的转换器可以捕获中文字符和单词之间的潜在交互作用。