Suppr超能文献

从中国电子病历中自动提取知识并构建类风湿性关节炎知识图谱。

Automatic knowledge extraction from Chinese electronic medical records and rheumatoid arthritis knowledge graph construction.

作者信息

Liu Feifei, Liu Mingtong, Li Meiting, Xin Yuwei, Gao Dongping, Wu Jun, Zhu Jiaan

机构信息

Department of Ultrasound, Binzhou Medical University Hospital, Binzhou, China.

Department of Ultrasound, Peking University People's Hospital, Beijing, China.

出版信息

Quant Imaging Med Surg. 2023 Jun 1;13(6):3873-3890. doi: 10.21037/qims-22-1158. Epub 2023 May 8.

Abstract

BACKGROUND

Knowledge graphs are a powerful tool for organizing knowledge, processing information and integrating scattered information, effectively visualizing the relationships among entities and supporting further intelligent applications. One of the critical tasks in building knowledge graphs is knowledge extraction. The existing knowledge extraction models in the Chinese medical domain usually require high-quality and large-scale manually labeled corpora for model training. In this study, we investigate rheumatoid arthritis (RA)-related Chinese electronic medical records (CEMRs) and address the automatic knowledge extraction task with a small number of annotated samples from CEMRs, from which an authoritative RA knowledge graph is constructed.

METHODS

After constructing the domain ontology of RA and completing manual labeling, we propose the MC-bidirectional encoder representation from transformers-bidirectional long short-term memory-conditional random field (BERT-BiLSTM-CRF) model for the named entity recognition (NER) task and the MC-BERT + feedforward neural network (FFNN) model for the entity extraction task. The pretrained language model (MC-BERT) is trained with many unlabeled medical data and fine-tuned using other medical domain datasets. We apply the established model to automatically label the remaining CEMRs, and then an RA knowledge graph is constructed based on the entities and entity relations, a preliminary assessment is conducted, and an intelligent application is presented.

RESULTS

The proposed model achieved better performance than that of other widely used models in knowledge extraction tasks, with mean F1 scores of 92.96% in entity recognition and 95.29% in relation extraction. This study preliminarily confirmed that using a pretrained medical language model could solve the problem that knowledge extraction from CEMRs requires a large number of manual annotations. An RA knowledge graph based on the above identified entities and extracted relations from 1,986 CEMRs was constructed. Experts verified the effectiveness of the constructed RA knowledge graph.

CONCLUSIONS

In this paper, an RA knowledge graph based on CEMRs was established, the processes of data annotation, automatic knowledge extraction, and knowledge graph construction were described, and a preliminary assessment and an application were presented. The study demonstrated the viability of a pretrained language model combined with a deep neural network for knowledge extraction tasks from CEMRs based on a small number of manually annotated samples.

摘要

背景

知识图谱是组织知识、处理信息和整合分散信息的强大工具,能有效可视化实体间的关系并支持进一步的智能应用。构建知识图谱的关键任务之一是知识提取。中医领域现有的知识提取模型通常需要高质量、大规模的人工标注语料库进行模型训练。在本研究中,我们研究了类风湿性关节炎(RA)相关的中文电子病历(CEMR),并利用少量来自CEMR的标注样本解决自动知识提取任务,据此构建了一个权威的RA知识图谱。

方法

在构建RA领域本体并完成人工标注后,我们提出了用于命名实体识别(NER)任务的基于变换器的双向编码器表征-双向长短期记忆-条件随机场(BERT-BiLSTM-CRF)模型以及用于实体提取任务的MC-BERT + 前馈神经网络(FFNN)模型。预训练语言模型(MC-BERT)使用大量未标注的医学数据进行训练,并使用其他医学领域数据集进行微调。我们应用已建立的模型自动标注其余的CEMR,然后基于实体和实体关系构建RA知识图谱,进行初步评估并展示一个智能应用。

结果

所提出的模型在知识提取任务中比其他广泛使用的模型表现更好,在实体识别中的平均F1分数为92.96%,在关系提取中的平均F1分数为95.29%。本研究初步证实,使用预训练的医学语言模型可以解决从CEMR中进行知识提取需要大量人工标注的问题。基于上述从1986份CEMR中识别出的实体和提取的关系构建了一个RA知识图谱。专家验证了所构建的RA知识图谱的有效性。

结论

本文建立了一个基于CEMR的RA知识图谱,描述了数据标注、自动知识提取和知识图谱构建的过程,并进行了初步评估和展示了一个应用。该研究证明了预训练语言模型与深度神经网络相结合用于基于少量人工标注样本从CEMR中进行知识提取任务的可行性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3155/10240026/e21fda896c36/qims-13-06-3873-f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验