中医研究：领域知识图谱补全与质量评估

Research on Traditional Chinese Medicine: Domain Knowledge Graph Completion and Quality Evaluation.

作者信息

Liu Chang, Li Zhan, Li Jianmin, Qu Yiqian, Chang Ying, Han Qing, Cao Lingyong, Lin Shuyuan

机构信息

School of Basic Medical Sciences, Zhejiang Chinese Medical University, Hangzhou, China.

Breast Disease Specialist Hospital of Guangdong Provincial Hospital of Chinese Medicine, Guangdong Provincial Hospital of Chinese Medicine, Guangzhou, China.

出版信息

JMIR Med Inform. 2024 Aug 2;12:e55090. doi: 10.2196/55090.

DOI:10.2196/55090

PMID:39094109

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11329848/

Abstract

BACKGROUND

Knowledge graphs (KGs) can integrate domain knowledge into a traditional Chinese medicine (TCM) intelligent syndrome differentiation model. However, the quality of current KGs in the TCM domain varies greatly, related to the lack of knowledge graph completion (KGC) and evaluation methods.

OBJECTIVE

This study aims to investigate KGC and evaluation methods tailored for TCM domain knowledge.

METHODS

In the KGC phase, according to the characteristics of TCM domain knowledge, we proposed a 3-step "entity-ontology-path" completion approach. This approach uses path reasoning, ontology rule reasoning, and association rules. In the KGC quality evaluation phase, we proposed a 3-dimensional evaluation framework that encompasses completeness, accuracy, and usability, using quantitative metrics such as complex network analysis, ontology reasoning, and graph representation. Furthermore, we compared the impact of different graph representation models on KG usability.

RESULTS

In the KGC phase, 52, 107, 27, and 479 triples were added by outlier analysis, rule-based reasoning, association rules, and path-based reasoning, respectively. In addition, rule-based reasoning identified 14 contradictory triples. In the KGC quality evaluation phase, in terms of completeness, KG had higher density and lower sparsity after completion, and there were no contradictory rules within the KG. In terms of accuracy, KG after completion was more consistent with prior knowledge. In terms of usability, the mean reciprocal ranking, mean rank, and hit rate of the first N tail entities predicted by the model (Hits@N) of the TransE, RotatE, DistMult, and ComplEx graph representation models all showed improvement after KGC. Among them, the RotatE model achieved the best representation.

CONCLUSIONS

The 3-step completion approach can effectively improve the completeness, accuracy, and availability of KGs, and the 3-dimensional evaluation framework can be used for comprehensive KGC evaluation. In the TCM field, the RotatE model performed better at KG representation.

摘要

背景

知识图谱（KGs）可将领域知识整合到中医智能辨证模型中。然而，当前中医领域知识图谱的质量差异很大，这与知识图谱补全（KGC）及评估方法的缺乏有关。

目的

本研究旨在探究适用于中医领域知识的KGC及评估方法。

方法

在KGC阶段，根据中医领域知识的特点，我们提出了一种三步“实体-本体-路径”补全方法。该方法使用路径推理、本体规则推理和关联规则。在KGC质量评估阶段，我们提出了一个三维评估框架，涵盖完整性、准确性和可用性，使用复杂网络分析、本体推理和图形表示等定量指标。此外，我们比较了不同图形表示模型对知识图谱可用性的影响。

结果

在KGC阶段，通过离群值分析、基于规则的推理、关联规则和基于路径的推理分别添加了52、107、27和479个三元组。此外，基于规则的推理识别出14个矛盾三元组。在KGC质量评估阶段，在完整性方面，补全后的知识图谱具有更高的密度和更低的稀疏性，并且知识图谱内没有矛盾规则。在准确性方面，补全后的知识图谱与先验知识更一致。在可用性方面，TransE、RotatE、DistMult和ComplEx图形表示模型的模型预测的前N个尾实体的平均倒数排名、平均排名和命中率（Hits@N）在KGC后均有所提高。其中，RotatE模型实现了最佳表示。