• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

真实世界数据医疗知识图谱:构建与应用。

Real-world data medical knowledge graph: construction and applications.

机构信息

Institute of Information Science, Beijing Jiaotong University, Beijing, China; Yidu Cloud Technology Inc., Beijing, China.

College of Computer Science, Chongqing University, Chongqing, China; Southwest Hospital, Chongqing, China.

出版信息

Artif Intell Med. 2020 Mar;103:101817. doi: 10.1016/j.artmed.2020.101817. Epub 2020 Feb 6.

DOI:10.1016/j.artmed.2020.101817
PMID:32143785
Abstract

OBJECTIVE

Medical knowledge graph (KG) is attracting attention from both academic and healthcare industry due to its power in intelligent healthcare applications. In this paper, we introduce a systematic approach to build medical KG from electronic medical records (EMRs) with evaluation by both technical experiments and end to end application examples.

MATERIALS AND METHODS

The original data set contains 16,217,270 de-identified clinical visit data of 3,767,198 patients. The KG construction procedure includes 8 steps, which are data preparation, entity recognition, entity normalization, relation extraction, property calculation, graph cleaning, related-entity ranking, and graph embedding respectively. We propose a novel quadruplet structure to represent medical knowledge instead of the classical triplet in KG. A novel related-entity ranking function considering probability, specificity and reliability (PSR) is proposed. Besides, probabilistic translation on hyperplanes (PrTransH) algorithm is used to learn graph embedding for the generated KG.

RESULTS

A medical KG with 9 entity types including disease, symptom, etc. was established, which contains 22,508 entities and 579,094 quadruplets. Compared with term frequency - inverse document frequency (TF/IDF) method, the normalized discounted cumulative gain (NDCG@10) increased from 0.799 to 0.906 with the proposed ranking function. The embedding representation for all entities and relations were learned, which are proven to be effective using disease clustering.

CONCLUSION

The established systematic procedure can efficiently construct a high-quality medical KG from large-scale EMRs. The proposed ranking function PSR achieves the best performance under all relations, and the disease clustering result validates the efficacy of the learned embedding vector as entity's semantic representation. Moreover, the obtained KG finds many successful applications due to its statistics-based quadruplet. where N is a minimum co-occurrence number and R is the basic reliability value. The reliability value can measure how reliable is the relationship between S and O. The reason for the definition is the higher value of N(S O), the relationship is more reliable. However, the reliability values of the two relationships should not have a big difference if both of their co-occurrence numbers are very big. In our study, we finally set N = 10 and R = 1 after some experiments. For instance, if co-occurrence numbers of three relationships are 1, 100 and 10000, their reliability values are 1, 2.96 and 5 respectively.

摘要

目的

由于在智能医疗应用中的强大功能,医学知识图谱(KG)正受到学术界和医疗保健行业的关注。在本文中,我们介绍了一种从电子病历(EMR)中构建医疗 KG 的系统方法,并通过技术实验和端到端应用示例进行了评估。

材料和方法

原始数据集包含 3767198 名患者的 16217270 条去识别临床就诊数据。KG 构建过程包括 8 个步骤,分别是数据准备、实体识别、实体规范化、关系提取、属性计算、图清理、相关实体排序和图嵌入。我们提出了一种新的四元组结构来表示医学知识,而不是 KG 中的经典三元组。我们提出了一种新的考虑概率、特异性和可靠性(PSR)的相关实体排序函数。此外,还使用概率超平面转换(PrTransH)算法学习生成的 KG 的图嵌入。

结果

建立了一个包含疾病、症状等 9 种实体类型的医疗 KG,其中包含 22508 个实体和 579094 个四元组。与词频-逆文档频率(TF/IDF)方法相比,使用提出的排序函数后,归一化折扣累积增益(NDCG@10)从 0.799 增加到 0.906。对所有实体和关系进行了嵌入表示的学习,使用疾病聚类证明了其有效性。

结论

该系统方法可以从大规模的 EMR 中高效构建高质量的医疗 KG。所提出的 PSR 排序函数在所有关系下都能达到最佳性能,疾病聚类结果验证了学习得到的嵌入向量作为实体语义表示的有效性。此外,由于基于统计的四元组,所获得的 KG 找到了许多成功的应用。其中,N 是最小共现次数,R 是基本可靠性值。可靠性值可以衡量 S 和 O 之间的关系的可靠性。定义的原因是 S 和 O 的共现次数越高,关系越可靠。但是,如果两个关系的共现次数都非常大,它们的可靠性值不应有太大差异。在我们的研究中,我们最终在一些实验后设置了 N=10 和 R=1。例如,如果三个关系的共现次数分别为 1、100 和 10000,则它们的可靠性值分别为 1、2.96 和 5。

相似文献

1
Real-world data medical knowledge graph: construction and applications.真实世界数据医疗知识图谱:构建与应用。
Artif Intell Med. 2020 Mar;103:101817. doi: 10.1016/j.artmed.2020.101817. Epub 2020 Feb 6.
2
A Method to Learn Embedding of a Probabilistic Medical Knowledge Graph: Algorithm Development.一种学习概率医学知识图谱嵌入的方法:算法开发
JMIR Med Inform. 2020 May 21;8(5):e17645. doi: 10.2196/17645.
3
Path-based knowledge reasoning with textual semantic information for medical knowledge graph completion.基于路径的知识推理与文本语义信息融合的医疗知识图谱补全方法
BMC Med Inform Decis Mak. 2021 Nov 29;21(Suppl 9):335. doi: 10.1186/s12911-021-01622-7.
4
A Knowledge Graph Entity Disambiguation Method Based on Entity-Relationship Embedding and Graph Structure Embedding.基于实体关系嵌入和图结构嵌入的知识图谱实体消歧方法。
Comput Intell Neurosci. 2021 Sep 23;2021:2878189. doi: 10.1155/2021/2878189. eCollection 2021.
5
Chinese Clinical Named Entity Recognition From Electronic Medical Records Based on Multisemantic Features by Using Robustly Optimized Bidirectional Encoder Representation From Transformers Pretraining Approach Whole Word Masking and Convolutional Neural Networks: Model Development and Validation.基于多语义特征,利用经过稳健优化的基于变换器预训练方法的全词掩码和卷积神经网络从电子病历中进行中文临床命名实体识别:模型开发与验证
JMIR Med Inform. 2023 May 10;11:e44597. doi: 10.2196/44597.
6
A BIGRU-Based Stacked Attention Network for Biomedical Named Entity Recognition with Chinese EMRs.基于 BIGRU 的堆叠注意力网络在中文电子病历中的生物医学命名实体识别。
Stud Health Technol Inform. 2023 Nov 23;308:757-767. doi: 10.3233/SHTI230909.
7
Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation.在PubMed查询中发现生物医学语义关系以进行信息检索和数据库管理。
Database (Oxford). 2016 Mar 25;2016. doi: 10.1093/database/baw025. Print 2016.
8
Automatic knowledge extraction from Chinese electronic medical records and rheumatoid arthritis knowledge graph construction.从中国电子病历中自动提取知识并构建类风湿性关节炎知识图谱。
Quant Imaging Med Surg. 2023 Jun 1;13(6):3873-3890. doi: 10.21037/qims-22-1158. Epub 2023 May 8.
9
Knowledge graph embedding with shared latent semantic units.基于共享潜在语义单元的知识图嵌入。
Neural Netw. 2021 Jul;139:140-148. doi: 10.1016/j.neunet.2021.02.013. Epub 2021 Feb 27.
10
Exploiting the semantic graph for the representation and retrieval of medical documents.利用语义图进行医学文献的表示和检索。
Comput Biol Med. 2018 Oct 1;101:39-50. doi: 10.1016/j.compbiomed.2018.08.009. Epub 2018 Aug 7.

引用本文的文献

1
Semantic Path-Guided Remote Sensing Recommendation for Natural Disasters Based on Knowledge Graph.基于知识图谱的自然灾害语义路径引导遥感推荐
Sensors (Basel). 2025 Sep 6;25(17):5575. doi: 10.3390/s25175575.
2
Research on the proximity relationships of psychosomatic disease knowledge graph modules extracted by large language models.大语言模型提取的心身疾病知识图谱模块的邻近关系研究。
Sci Rep. 2025 Jul 1;15(1):20653. doi: 10.1038/s41598-025-05499-8.
3
Constructing public health evidence knowledge graph for decision-making support from COVID-19 literature of modelling study.
从新冠肺炎建模研究文献构建用于决策支持的公共卫生证据知识图谱。
J Saf Sci Resil. 2021 Sep;2(3):146-156. doi: 10.1016/j.jnlssr.2021.08.002. Epub 2021 Aug 13.
4
MedKG: enabling drug discovery through a unified biomedical knowledge graph.MedKG:通过统一的生物医学知识图谱助力药物发现。
Mol Divers. 2025 Mar 14. doi: 10.1007/s11030-025-11164-z.
5
A Multi-Task Causal Knowledge Fault Diagnosis Method for PMSM-ITSF Based on Meta-Learning.一种基于元学习的永磁同步电机集成温度监测系统多任务因果知识故障诊断方法
Sensors (Basel). 2025 Feb 19;25(4):1271. doi: 10.3390/s25041271.
6
A Chinese Knowledge Graph Dataset in the Field of Scientific Fitness.一个科学健身领域的中文知识图谱数据集。
Sci Data. 2025 Feb 4;12(1):205. doi: 10.1038/s41597-025-04519-6.
7
BioGSF: a graph-driven semantic feature integration framework for biomedical relation extraction.BioGSF:一种用于生物医学关系提取的图驱动语义特征集成框架。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf025.
8
Patient-centric knowledge graphs: a survey of current methods, challenges, and applications.以患者为中心的知识图谱:当前方法、挑战及应用综述
Front Artif Intell. 2024 Oct 23;7:1388479. doi: 10.3389/frai.2024.1388479. eCollection 2024.
9
BioKGrapher: Initial evaluation of automated knowledge graph construction from biomedical literature.生物知识图谱绘制工具:对从生物医学文献中自动构建知识图谱的初步评估。
Comput Struct Biotechnol J. 2024 Oct 17;24:639-660. doi: 10.1016/j.csbj.2024.10.017. eCollection 2024 Dec.
10
Knowledge graph driven medicine recommendation system using graph neural networks on longitudinal medical records.基于图神经网络的知识图谱驱动的纵向医疗记录医学推荐系统。
Sci Rep. 2024 Oct 26;14(1):25449. doi: 10.1038/s41598-024-75784-5.