Suppr超能文献

通过基于注意力的语义匹配对中文临床术语进行自动SNOMED CT编码。

Automatic SNOMED CT coding of Chinese clinical terms via attention-based semantic matching.

作者信息

Chen Yani, Hu Danqing, Li Mengyang, Duan Huilong, Lu Xudong

机构信息

College of Biomedical Engineering and Instrument Science, Zhejiang University, Zheda Road, 310027 Hanghzou, China.

College of Biomedical Engineering and Instrument Science, Zhejiang University, Zheda Road, 310027 Hanghzou, China.

出版信息

Int J Med Inform. 2022 Mar;159:104676. doi: 10.1016/j.ijmedinf.2021.104676. Epub 2021 Dec 28.

Abstract

BACKGROUND

A considerable amount of meaningful information is routinely recorded in Chinese clinical data in text format, referred to as Chinese clinical terms. The lack of coding is a major difficulty hindering the application of clinical terms. SNOMED CT is a widely used and comprehensive clinical health care terminology collection because of its coverage, granularity, clinical orientation, and logical underpinning. It is useful and efficient for automatically assigning SNOMED CT codes to Chinese clinical terms, but it still faces several problems. Current cross-language clinical term matching studies rely on external resources, such as machine translation and rule-based methods. Semantic matching methods have achieved strong performance on text matching, but few studies have been done on cross-language clinical term matching. We present an effective attention-based semantic matching algorithm to automatically cross-language code Chinese clinical terms with SNOMED CT.

METHOD

Firstly, BERT was used to turn the input into word embedding. Then, the word embeddings were encoded through a BiLSTM with self-attention to focus on capturing distant relationships among words with different weights depending on their contribution to semantic matching. Then, decomposable attention was used to make semantic matching trivially parallelizable to speed up calculation. Finally, fully connected layers and a sigmoid were utilized to output matching results.

RESULTS

The 29,960 manually coded Chinese clinical terms, 30,040 unmatched Chinese clinical terms and SNOMED CT codes were collected to evaluate the proposed method. Compared with the existing semantic matching method, the proposed approach achieves state-of-the-art results demonstrating the effectiveness of the method with an accuracy of 0.905, a precision of 0.856, a recall of 0.518, and an F-measure of 0.645. The proposed Chinese-English bilingual term mapping, Chinese character-level and word-level encoder, English word-level encoder, BERT model, and attention mechanism performed better than other methods.

CONCLUSION

The proposed automatic SNOMED CT coding approach of Chinese clinical terms via attention-based semantic matching can improve the performance of automated SNOMED CT code assignment for Chinese clinical terms and improve the efficiency of the code assignment.

摘要

背景

在中国临床数据中,大量有意义的信息通常以文本格式记录,即中文临床术语。缺乏编码是阻碍临床术语应用的主要难题。SNOMED CT是一个广泛使用的综合性临床医疗术语集,因其覆盖范围、粒度、临床导向和逻辑基础而备受青睐。它对于自动为中文临床术语分配SNOMED CT编码很有用且高效,但仍面临一些问题。当前的跨语言临床术语匹配研究依赖外部资源,如机器翻译和基于规则的方法。语义匹配方法在文本匹配方面表现出色,但跨语言临床术语匹配的研究较少。我们提出一种基于注意力的有效语义匹配算法,用于自动将中文临床术语与SNOMED CT进行跨语言编码。

方法

首先,使用BERT将输入转换为词嵌入。然后,通过带有自注意力的双向长短期记忆网络(BiLSTM)对词嵌入进行编码,以根据词对语义匹配的贡献,通过不同权重关注捕捉词之间的远距离关系。接着,使用可分解注意力使语义匹配易于并行化以加速计算。最后,利用全连接层和 sigmoid 函数输出匹配结果。

结果

收集了29,960个手动编码的中文临床术语、30,040个未匹配的中文临床术语以及SNOMED CT编码来评估所提出的方法。与现有的语义匹配方法相比,所提出的方法取得了领先的结果,证明了该方法的有效性,准确率为0.905,精确率为0.856,召回率为0.518,F值为0.645。所提出的汉英双语术语映射、汉字级和单词级编码器、英语单词级编码器、BERT模型和注意力机制比其他方法表现更好。

结论

所提出的基于注意力的语义匹配自动为中文临床术语进行SNOMED CT编码的方法,可以提高中文临床术语自动分配SNOMED CT编码的性能,并提高编码分配的效率。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验