SiBERT：一种基于 Siamese 的 BERT 网络，用于中文医疗实体对齐。

SiBERT: A Siamese-based BERT network for Chinese medical entities alignment.

机构信息

Faculty of Science, Beijing University of Technology, Beijing, China.

Faculty of Information, Beijing University of Technology, Beijing, China.

出版信息

Methods. 2022 Sep;205:133-139. doi: 10.1016/j.ymeth.2022.07.003. Epub 2022 Jul 4.

DOI:10.1016/j.ymeth.2022.07.003

PMID:35798258

Abstract

Entity alignment aims at associating semantically similar entities in knowledge graphs from different sources. It is widely used in the integration and construction of professional medical knowledge. The existing deep learning methods lack term-level embedding representation, which limits the performance of entity alignment and causes a massive computational overhead. To address these problems, we propose a Siamese-based BERT (SiBERT) for Chinese medical entities alignment. SiBERT generates term-level embedding based on word embedding sequences to enhance the features of entities in similarity calculation. The process of entity alignment contains three steps. Specifically, the SiBERT is firstly pre-trained with synonym dictionary in the public domain, and transferred to the task of medical entity alignment. Secondly, four different categories of entities (disease, symptom, treatment, and examination) are labeled based on the standard terms selected from standard terms dataset. The entities and their standard terms form term pairs to train SiBERT. Finally, combined with the entity alignment algorithm, the most similar standard term is selected as the final result. To evaluate the effectiveness of our method, we conduct extensive experiments on real-world datasets. The experimental results illustrate that SiBERT network is superior to other compared algorithms both in alignment accuracy and computational efficiency.

摘要

实体对齐旨在将来自不同来源的知识图中的语义相似实体关联起来。它广泛应用于专业医学知识的整合和构建。现有的深度学习方法缺乏术语级别的嵌入表示，这限制了实体对齐的性能，并导致大量的计算开销。针对这些问题，我们提出了一种基于 Siamese 的 BERT（SiBERT）用于中文医疗实体对齐。SiBERT 基于词嵌入序列生成术语级别的嵌入，以增强相似性计算中实体的特征。实体对齐的过程包含三个步骤。具体来说，首先使用公共领域的同义词词典对 SiBERT 进行预训练，并将其转移到医疗实体对齐任务中。其次，根据标准术语数据集选择的标准术语，对四个不同类别的实体（疾病、症状、治疗和检查）进行标注。实体及其标准术语形成术语对来训练 SiBERT。最后，结合实体对齐算法，选择最相似的标准术语作为最终结果。为了评估我们方法的有效性，我们在真实数据集上进行了广泛的实验。实验结果表明，SiBERT 网络在对齐精度和计算效率方面均优于其他对比算法。

相似文献

SiBERT: A Siamese-based BERT network for Chinese medical entities alignment.SiBERT：一种基于 Siamese 的 BERT 网络，用于中文医疗实体对齐。

Methods. 2022 Sep;205:133-139. doi: 10.1016/j.ymeth.2022.07.003. Epub 2022 Jul 4.

Stacking-BERT model for Chinese medical procedure entity normalization.基于堆叠 BERT 的中文医疗过程实体标准化模型。

Math Biosci Eng. 2023 Jan;20(1):1018-1036. doi: 10.3934/mbe.2023047. Epub 2022 Oct 24.

Named entity recognition of Chinese electronic medical records based on a hybrid neural network and medical MC-BERT.基于混合神经网络和医学 MC-BERT 的中文电子病历命名实体识别。

BMC Med Inform Decis Mak. 2022 Dec 1;22(1):315. doi: 10.1186/s12911-022-02059-2.

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records.从中文电子病历中提取垂体腺瘤的临床命名实体。

BMC Med Inform Decis Mak. 2022 Mar 23;22(1):72. doi: 10.1186/s12911-022-01810-z.

A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records.基于词性和自匹配注意力的深度学习模型在中文电子病历命名实体识别中的应用。

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):65. doi: 10.1186/s12911-019-0762-7.

Automatic knowledge extraction from Chinese electronic medical records and rheumatoid arthritis knowledge graph construction.从中国电子病历中自动提取知识并构建类风湿性关节炎知识图谱。

Quant Imaging Med Surg. 2023 Jun 1;13(6):3873-3890. doi: 10.21037/qims-22-1158. Epub 2023 May 8.

A novel deep learning approach to extract Chinese clinical entities for lung cancer screening and staging.一种用于提取肺癌筛查和分期用中文临床实体的新型深度学习方法。

BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):214. doi: 10.1186/s12911-021-01575-x.

A BIGRU-Based Stacked Attention Network for Biomedical Named Entity Recognition with Chinese EMRs.基于 BIGRU 的堆叠注意力网络在中文电子病历中的生物医学命名实体识别。

Stud Health Technol Inform. 2023 Nov 23;308:757-767. doi: 10.3233/SHTI230909.

Multi-Level Representation Learning for Chinese Medical Entity Recognition: Model Development and Validation.用于中文医学实体识别的多层次表示学习：模型开发与验证

JMIR Med Inform. 2020 May 4;8(5):e17637. doi: 10.2196/17637.

Path-based knowledge reasoning with textual semantic information for medical knowledge graph completion.基于路径的知识推理与文本语义信息融合的医疗知识图谱补全方法

BMC Med Inform Decis Mak. 2021 Nov 29;21(Suppl 9):335. doi: 10.1186/s12911-021-01622-7.

引用本文的文献

Comparative Analysis of Large Language Models in Chinese Medical Named Entity Recognition.中文医学命名实体识别中大型语言模型的比较分析

Bioengineering (Basel). 2024 Sep 29;11(10):982. doi: 10.3390/bioengineering11100982.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

SiBERT：一种基于 Siamese 的 BERT 网络，用于中文医疗实体对齐。

SiBERT: A Siamese-based BERT network for Chinese medical entities alignment.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献