Suppr超能文献

基于多层软晶格的中文临床命名实体识别模型。

A multi-layer soft lattice based model for Chinese clinical named entity recognition.

机构信息

State Key Laboratory of Intelligent Control and Decision of Complex Systems, School of Automation, Beijing Institute of Technology, Beijing, China.

Department of Cardiology, The Second Medical Center, National Clinical Research Center for Geriatric Diseases, Chinese PLA General Hospital, Beijing, China.

出版信息

BMC Med Inform Decis Mak. 2022 Jul 30;22(1):201. doi: 10.1186/s12911-022-01924-4.

Abstract

OBJECTIVE

Named entity recognition (NER) is a key and fundamental part of many medical and clinical tasks, including the establishment of a medical knowledge graph, decision-making support, and question answering systems. When extracting entities from electronic health records (EHRs), NER models mostly apply long short-term memory (LSTM) and have surprising performance in clinical NER. However, increasing the depth of the network is often required by these LSTM-based models to capture long-distance dependencies. Therefore, these LSTM-based models that have achieved high accuracy generally require long training times and extensive training data, which has obstructed the adoption of LSTM-based models in clinical scenarios with limited training time.

METHOD

Inspired by Transformer, we combine Transformer with Soft Term Position Lattice to form soft lattice structure Transformer, which models long-distance dependencies similarly to LSTM. Our model consists of four components: the WordPiece module, the BERT module, the soft lattice structure Transformer module, and the CRF module.

RESULT

Our experiments demonstrated that this approach increased the F1 by 1-5% in the CCKS NER task compared to other models based on LSTM with CRF and consumed less training time. Additional evaluations showed that lattice structure transformer shows good performance for recognizing long medical terms, abbreviations, and numbers. The proposed model achieve 91.6% f-measure in recognizing long medical terms and 90.36% f-measure in abbreviations, and numbers.

CONCLUSIONS

By using soft lattice structure Transformer, the method proposed in this paper captured Chinese words to lattice information, making our model suitable for Chinese clinical medical records. Transformers with Mutilayer soft lattice Chinese word construction can capture potential interactions between Chinese characters and words.

摘要

目的

命名实体识别(NER)是许多医学和临床任务的关键和基础部分,包括建立医学知识图谱、决策支持和问答系统。在从电子健康记录(EHR)中提取实体时,NER 模型大多应用长短期记忆(LSTM),并在临床 NER 中具有惊人的性能。然而,这些基于 LSTM 的模型通常需要增加网络的深度来捕获长距离依赖关系。因此,这些基于 LSTM 的模型虽然取得了很高的准确率,但通常需要较长的训练时间和大量的训练数据,这阻碍了基于 LSTM 的模型在训练时间有限的临床环境中的采用。

方法

受 Transformer 的启发,我们将 Transformer 与 Soft Term Position Lattice 结合,形成类似于 LSTM 的软格结构 Transformer,用于建模长距离依赖关系。我们的模型由四个组件组成:WordPiece 模块、BERT 模块、软格结构 Transformer 模块和 CRF 模块。

结果

我们的实验表明,与基于 LSTM 与 CRF 的其他模型相比,该方法在 CCKS NER 任务中的 F1 值提高了 1-5%,并且训练时间更短。此外的评估表明,格结构转换器在识别长医疗术语、缩写和数字方面表现良好。所提出的模型在识别长医疗术语方面的 F1 得分为 91.6%,在识别缩写和数字方面的 F1 得分为 90.36%。

结论

通过使用软格结构 Transformer,本文提出的方法将中文单词映射到格信息中,使我们的模型适用于中文临床病历。具有多层软格中文词构建的转换器可以捕获中文字符和单词之间的潜在交互作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1eff/9338545/39091689dbde/12911_2022_1924_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验