Yu Guangya, Ye Qi, Ruan Tong
Zhejiang Laboratory, Hangzhou 311121, China.
School of Information Science and Technology, East China University of Science and Technology, Shanghai 200237, China.
Bioengineering (Basel). 2024 Feb 27;11(3):225. doi: 10.3390/bioengineering11030225.
The construction of medical knowledge graphs (MKGs) is steadily progressing from manual to automatic methods, which inevitably introduce noise, which could impair the performance of downstream healthcare applications. Existing error detection approaches depend on the topological structure and external labels of entities in MKGs to improve their quality. Nevertheless, due to the cost of manual annotation and imperfect automatic algorithms, precise entity labels in MKGs cannot be readily obtained. To address these issues, we propose an approach named Enhancing error detection on Medical knowledge graphs via intrinsic labEL (EMKGEL). Considering the absence of hyper-view KG, we establish a hyper-view KG and a triplet-level KG for implicit label information and neighborhood information, respectively. Inspired by the success of graph attention networks (GATs), we introduce the hyper-view GAT to incorporate label messages and neighborhood information into representation learning. We leverage a confidence score that combines local and global trustworthiness to estimate the triplets. To validate the effectiveness of our approach, we conducted experiments on three publicly available MKGs, namely PharmKG-8k, DiseaseKG, and DiaKG. Compared with the baseline models, the Precision@K value improved by 0.7%, 6.1%, and 3.6%, respectively, on these datasets. Furthermore, our method empirically showed that it significantly outperformed the baseline on a general knowledge graph, Nell-995.
医学知识图谱(MKG)的构建正从手动方法稳步向自动方法发展,这不可避免地会引入噪声,而噪声可能会损害下游医疗应用的性能。现有的错误检测方法依赖于MKG中实体的拓扑结构和外部标签来提高其质量。然而,由于手动标注成本高昂且自动算法不完善,MKG中精确的实体标签难以轻易获得。为了解决这些问题,我们提出了一种名为“通过内在标签增强医学知识图谱错误检测”(EMKGEL)的方法。考虑到超视图知识图谱的缺失,我们分别为隐式标签信息和邻域信息建立了一个超视图知识图谱和一个三元组级知识图谱。受图注意力网络(GAT)成功的启发,我们引入了超视图GAT,将标签消息和邻域信息纳入表示学习。我们利用一个结合了局部和全局可信度的置信度分数来估计三元组。为了验证我们方法的有效性,我们在三个公开可用的MKG上进行了实验,即PharmKG - 8k、DiseaseKG和DiaKG。在这些数据集上,与基线模型相比,Precision@K值分别提高了0.7%、6.1%和3.6%。此外,我们的方法通过实验表明,在通用知识图谱Nell - 995上,它显著优于基线模型。