School of computer science and technology, Zhejiang Sci-Tech University, Hangzhou 310018, China.
School of Information Science and Engineering, Zhejiang Sci-Tech University, Hangzhou 310018, China.
Comput Biol Med. 2024 Nov;182:109161. doi: 10.1016/j.compbiomed.2024.109161. Epub 2024 Sep 18.
The advancement of medical informatization necessitates extracting entities and their relationships from electronic medical records. Presently, research on electronic medical records predominantly concentrates on single-entity relationship extraction. However, clinical electronic medical records frequently exhibit overlapping complex entity relationships, thereby heightening the challenge of information extraction. To rectify the absence of a clinical medical relationship extraction dataset, this study utilizes electronic medical records from 584 patients in a hospital to create a compact clinical medical relationship extraction dataset. To address the pipelined relationship extraction model's limitation in overlooking the one-to-many correlation problem between entities and relationships, this paper introduces a cascading relationship extraction model. This model integrates the MacBERT pre-training model, gated recurrent network, and multi-head self-attention mechanism to enhance the extraction of text features. Simultaneously, adversarial learning is incorporated to bolster the model's robustness. In scenarios involving one-to-many relationships between entities, a two-phase task is employed. Initially, the main entity is predicted, followed by predicting the associated object and their correspondences. Employing this cascade-structured approach enables the model to flexibly manage intricate entity relationships, thereby enhancing extraction accuracy. Experimental results demonstrate the model's efficiency, yielding F1-scores of 82.8%, 76.8%, and 88.2% for fulfilling relational extraction requirements and tasks on DuIE, CHIP-CDEE, and private datasets, respectively. These scores represent improvements over the benchmark model. The findings indicate the model's applicability in practical domains, particularly in tasks such as biomedical information extraction.
医疗信息化的发展需要从电子病历中提取实体及其关系。目前,电子病历的研究主要集中在单一实体关系的提取上。然而,临床电子病历经常表现出重叠的复杂实体关系,从而增加了信息提取的难度。为了解决临床医学关系提取数据集的缺乏,本研究利用来自一家医院的 584 名患者的电子病历创建了一个紧凑的临床医学关系提取数据集。为了解决流水线关系提取模型忽略实体和关系之间一对一多相关问题的局限性,本文引入了级联关系提取模型。该模型集成了 MacBERT 预训练模型、门控循环网络和多头自注意力机制,以增强对文本特征的提取。同时,引入对抗学习来增强模型的鲁棒性。在实体之间存在一对多关系的情况下,采用两阶段任务。首先,预测主要实体,然后预测相关对象及其对应关系。使用这种级联结构方法,模型可以灵活地管理复杂的实体关系,从而提高提取准确性。实验结果表明,该模型的效率很高,在 DuIE、CHIP-CDEE 和私有数据集上的关系提取要求和任务的 F1 分数分别为 82.8%、76.8%和 88.2%,优于基准模型。这些结果表明该模型在实际领域中的适用性,特别是在生物医学信息提取等任务中。