Cai Fenghua, He Jianfeng, Liu Yunchuan, Zhang Hongjiang
Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan, China.
Department of Medical Imaging, The First People Hospital of Anning City, Anning, China.
Front Med (Lausanne). 2024 May 9;11:1272224. doi: 10.3389/fmed.2024.1272224. eCollection 2024.
Venous thromboembolism (VTE) is characterized by high morbidity, mortality, and complex treatment. A VTE knowledge graph (VTEKG) can effectively integrate VTE-related medical knowledge and offer an intuitive description and analysis of the relations between medical entities. However, current methods for constructing knowledge graphs typically suffer from error propagation and redundant information.
In this study, we propose a deep learning-based joint extraction model, Biaffine Common-Sequence Self-Attention Linker (BCSLinker), for Chinese electronic medical records to address the issues mentioned above, which often occur when constructing a VTEKG. First, the Biaffine Common-Sequence Self-Attention (BCsSa) module is employed to create global matrices and extract entities and relations simultaneously, mitigating error propagation. Second, the multi-label cross-entropy loss is utilized to diminish the impact of redundant information and enhance information extraction.
We used the electronic medical record data of VTE patients from a tertiary hospital, achieving an F1 score of 86.9% on BCSLinker. It outperforms the other joint entity and relation extraction models discussed in this study. In addition, we developed a question-answering system based on the VTEKG as a structured data source.
This study has constructed a more accurate and comprehensive VTEKG that can provide reference for diagnosing, evaluating, and treating VTE as well as supporting patient self-care, which is of considerable clinical value.
静脉血栓栓塞症(VTE)具有高发病率、高死亡率和治疗复杂的特点。VTE知识图谱(VTEKG)可以有效地整合与VTE相关的医学知识,并对医学实体之间的关系提供直观的描述和分析。然而,当前构建知识图谱的方法通常存在错误传播和信息冗余的问题。
在本研究中,我们提出了一种基于深度学习的联合提取模型,即双仿射公共序列自注意力链接器(BCSLinker),用于处理中文电子病历,以解决构建VTEKG时经常出现的上述问题。首先,采用双仿射公共序列自注意力(BCsSa)模块来创建全局矩阵并同时提取实体和关系,减轻错误传播。其次,利用多标签交叉熵损失来减少冗余信息的影响并增强信息提取。
我们使用了一家三级医院的VTE患者电子病历数据,BCSLinker的F1分数达到了86.9%。它优于本研究中讨论的其他联合实体和关系提取模型。此外,我们基于VTEKG开发了一个问答系统作为结构化数据源。
本研究构建了一个更准确、全面的VTEKG,可为VTE的诊断、评估和治疗以及支持患者自我护理提供参考,具有相当大的临床价值。