• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过基于注意力的语义匹配对中文临床术语进行自动SNOMED CT编码。

Automatic SNOMED CT coding of Chinese clinical terms via attention-based semantic matching.

作者信息

Chen Yani, Hu Danqing, Li Mengyang, Duan Huilong, Lu Xudong

机构信息

College of Biomedical Engineering and Instrument Science, Zhejiang University, Zheda Road, 310027 Hanghzou, China.

College of Biomedical Engineering and Instrument Science, Zhejiang University, Zheda Road, 310027 Hanghzou, China.

出版信息

Int J Med Inform. 2022 Mar;159:104676. doi: 10.1016/j.ijmedinf.2021.104676. Epub 2021 Dec 28.

DOI:10.1016/j.ijmedinf.2021.104676
PMID:34990940
Abstract

BACKGROUND

A considerable amount of meaningful information is routinely recorded in Chinese clinical data in text format, referred to as Chinese clinical terms. The lack of coding is a major difficulty hindering the application of clinical terms. SNOMED CT is a widely used and comprehensive clinical health care terminology collection because of its coverage, granularity, clinical orientation, and logical underpinning. It is useful and efficient for automatically assigning SNOMED CT codes to Chinese clinical terms, but it still faces several problems. Current cross-language clinical term matching studies rely on external resources, such as machine translation and rule-based methods. Semantic matching methods have achieved strong performance on text matching, but few studies have been done on cross-language clinical term matching. We present an effective attention-based semantic matching algorithm to automatically cross-language code Chinese clinical terms with SNOMED CT.

METHOD

Firstly, BERT was used to turn the input into word embedding. Then, the word embeddings were encoded through a BiLSTM with self-attention to focus on capturing distant relationships among words with different weights depending on their contribution to semantic matching. Then, decomposable attention was used to make semantic matching trivially parallelizable to speed up calculation. Finally, fully connected layers and a sigmoid were utilized to output matching results.

RESULTS

The 29,960 manually coded Chinese clinical terms, 30,040 unmatched Chinese clinical terms and SNOMED CT codes were collected to evaluate the proposed method. Compared with the existing semantic matching method, the proposed approach achieves state-of-the-art results demonstrating the effectiveness of the method with an accuracy of 0.905, a precision of 0.856, a recall of 0.518, and an F-measure of 0.645. The proposed Chinese-English bilingual term mapping, Chinese character-level and word-level encoder, English word-level encoder, BERT model, and attention mechanism performed better than other methods.

CONCLUSION

The proposed automatic SNOMED CT coding approach of Chinese clinical terms via attention-based semantic matching can improve the performance of automated SNOMED CT code assignment for Chinese clinical terms and improve the efficiency of the code assignment.

摘要

背景

在中国临床数据中,大量有意义的信息通常以文本格式记录,即中文临床术语。缺乏编码是阻碍临床术语应用的主要难题。SNOMED CT是一个广泛使用的综合性临床医疗术语集,因其覆盖范围、粒度、临床导向和逻辑基础而备受青睐。它对于自动为中文临床术语分配SNOMED CT编码很有用且高效,但仍面临一些问题。当前的跨语言临床术语匹配研究依赖外部资源,如机器翻译和基于规则的方法。语义匹配方法在文本匹配方面表现出色,但跨语言临床术语匹配的研究较少。我们提出一种基于注意力的有效语义匹配算法,用于自动将中文临床术语与SNOMED CT进行跨语言编码。

方法

首先,使用BERT将输入转换为词嵌入。然后,通过带有自注意力的双向长短期记忆网络(BiLSTM)对词嵌入进行编码,以根据词对语义匹配的贡献,通过不同权重关注捕捉词之间的远距离关系。接着,使用可分解注意力使语义匹配易于并行化以加速计算。最后,利用全连接层和 sigmoid 函数输出匹配结果。

结果

收集了29,960个手动编码的中文临床术语、30,040个未匹配的中文临床术语以及SNOMED CT编码来评估所提出的方法。与现有的语义匹配方法相比,所提出的方法取得了领先的结果,证明了该方法的有效性,准确率为0.905,精确率为0.856,召回率为0.518,F值为0.645。所提出的汉英双语术语映射、汉字级和单词级编码器、英语单词级编码器、BERT模型和注意力机制比其他方法表现更好。

结论

所提出的基于注意力的语义匹配自动为中文临床术语进行SNOMED CT编码的方法,可以提高中文临床术语自动分配SNOMED CT编码的性能,并提高编码分配的效率。

相似文献

1
Automatic SNOMED CT coding of Chinese clinical terms via attention-based semantic matching.通过基于注意力的语义匹配对中文临床术语进行自动SNOMED CT编码。
Int J Med Inform. 2022 Mar;159:104676. doi: 10.1016/j.ijmedinf.2021.104676. Epub 2021 Dec 28.
2
Supporting SNOMED CT postcoordination with knowledge graph embeddings.利用知识图谱嵌入技术支持SNOMED CT后置协调。
J Biomed Inform. 2023 Mar;139:104297. doi: 10.1016/j.jbi.2023.104297. Epub 2023 Feb 1.
3
PCEtoFHIR: Decomposition of Postcoordinated SNOMED CT Expressions for Storage as HL7 FHIR Resources.PCEtoFHIR:用于存储为 HL7 FHIR 资源的后协调 SNOMED CT 表达式的分解。
JMIR Med Inform. 2024 Sep 17;12:e57853. doi: 10.2196/57853.
4
Use of the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) for Processing Free Text in Health Care: Systematic Scoping Review.系统医学术语命名法(SNOMED CT)在医疗保健中处理自由文本的应用:系统范围综述。
J Med Internet Res. 2021 Jan 26;23(1):e24594. doi: 10.2196/24594.
5
A comparative analysis of the density of the SNOMED CT conceptual content for semantic harmonization.用于语义协调的SNOMED CT概念内容密度的比较分析。
Artif Intell Med. 2015 May;64(1):29-40. doi: 10.1016/j.artmed.2015.03.002. Epub 2015 Apr 2.
6
Enriching the international clinical nomenclature with Chinese daily used synonyms and concept recognition in physician notes.用中文常用同义词丰富国际临床术语表,并在医生记录中进行概念识别。
BMC Med Inform Decis Mak. 2017 May 2;17(1):54. doi: 10.1186/s12911-017-0455-z.
7
Automatic ICD-10 coding: Deep semantic matching based on analogical reasoning.自动ICD-10编码:基于类比推理的深度语义匹配
Heliyon. 2023 Apr 19;9(4):e15570. doi: 10.1016/j.heliyon.2023.e15570. eCollection 2023 Apr.
8
Cross-Language Terminology Mapping Between ICD-10-CN and SNOMED-CT.ICD-10-CN 与 SNOMED-CT 之间的跨语言术语映射。
Stud Health Technol Inform. 2022 Jun 6;290:42-46. doi: 10.3233/SHTI220028.
9
A semi-automatic semantic method for mapping SNOMED CT concepts to VCM Icons.一种将SNOMED CT概念映射到VCM图标集的半自动语义方法。
Stud Health Technol Inform. 2013;192:42-6.
10
Definition and validation of SNOMED CT subsets using the expression constraint language.使用表达式约束语言定义和验证 SNOMED CT 子集。
J Biomed Inform. 2021 May;117:103747. doi: 10.1016/j.jbi.2021.103747. Epub 2021 Mar 19.

引用本文的文献

1
Mapping Drug Terms via Integration of a Retrieval-Augmented Generation Algorithm with a Large Language Model.通过将检索增强生成算法与大语言模型相结合来映射药物术语
Healthc Inform Res. 2024 Oct;30(4):355-363. doi: 10.4258/hir.2024.30.4.355. Epub 2024 Oct 31.
2
Use of SNOMED CT in Large Language Models: Scoping Review.SNOMED CT 在大语言模型中的应用:范围综述。
JMIR Med Inform. 2024 Oct 7;12:e62924. doi: 10.2196/62924.
3
Existing barriers and recommendations of real-world data standardisation for clinical research in China: a qualitative study.
现有障碍和中国临床研究中真实世界数据标准化的建议:一项定性研究。
BMJ Open. 2022 Aug 3;12(8):e059029. doi: 10.1136/bmjopen-2021-059029.