Suppr超能文献

SiBERT:一种基于 Siamese 的 BERT 网络,用于中文医疗实体对齐。

SiBERT: A Siamese-based BERT network for Chinese medical entities alignment.

机构信息

Faculty of Science, Beijing University of Technology, Beijing, China.

Faculty of Information, Beijing University of Technology, Beijing, China.

出版信息

Methods. 2022 Sep;205:133-139. doi: 10.1016/j.ymeth.2022.07.003. Epub 2022 Jul 4.

Abstract

Entity alignment aims at associating semantically similar entities in knowledge graphs from different sources. It is widely used in the integration and construction of professional medical knowledge. The existing deep learning methods lack term-level embedding representation, which limits the performance of entity alignment and causes a massive computational overhead. To address these problems, we propose a Siamese-based BERT (SiBERT) for Chinese medical entities alignment. SiBERT generates term-level embedding based on word embedding sequences to enhance the features of entities in similarity calculation. The process of entity alignment contains three steps. Specifically, the SiBERT is firstly pre-trained with synonym dictionary in the public domain, and transferred to the task of medical entity alignment. Secondly, four different categories of entities (disease, symptom, treatment, and examination) are labeled based on the standard terms selected from standard terms dataset. The entities and their standard terms form term pairs to train SiBERT. Finally, combined with the entity alignment algorithm, the most similar standard term is selected as the final result. To evaluate the effectiveness of our method, we conduct extensive experiments on real-world datasets. The experimental results illustrate that SiBERT network is superior to other compared algorithms both in alignment accuracy and computational efficiency.

摘要

实体对齐旨在将来自不同来源的知识图中的语义相似实体关联起来。它广泛应用于专业医学知识的整合和构建。现有的深度学习方法缺乏术语级别的嵌入表示,这限制了实体对齐的性能,并导致大量的计算开销。针对这些问题,我们提出了一种基于 Siamese 的 BERT(SiBERT)用于中文医疗实体对齐。SiBERT 基于词嵌入序列生成术语级别的嵌入,以增强相似性计算中实体的特征。实体对齐的过程包含三个步骤。具体来说,首先使用公共领域的同义词词典对 SiBERT 进行预训练,并将其转移到医疗实体对齐任务中。其次,根据标准术语数据集选择的标准术语,对四个不同类别的实体(疾病、症状、治疗和检查)进行标注。实体及其标准术语形成术语对来训练 SiBERT。最后,结合实体对齐算法,选择最相似的标准术语作为最终结果。为了评估我们方法的有效性,我们在真实数据集上进行了广泛的实验。实验结果表明,SiBERT 网络在对齐精度和计算效率方面均优于其他对比算法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验