National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China.
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China.
BMC Med Inform Decis Mak. 2021 Nov 29;21(Suppl 9):335. doi: 10.1186/s12911-021-01622-7.
Knowledge graphs (KGs), especially medical knowledge graphs, are often significantly incomplete, so it necessitating a demand for medical knowledge graph completion (MedKGC). MedKGC can find new facts based on the existed knowledge in the KGs. The path-based knowledge reasoning algorithm is one of the most important approaches to this task. This type of method has received great attention in recent years because of its high performance and interpretability. In fact, traditional methods such as path ranking algorithm take the paths between an entity pair as atomic features. However, the medical KGs are very sparse, which makes it difficult to model effective semantic representation for extremely sparse path features. The sparsity in the medical KGs is mainly reflected in the long-tailed distribution of entities and paths. Previous methods merely consider the context structure in the paths of knowledge graph and ignore the textual semantics of the symbols in the path. Therefore, their performance cannot be further improved due to the two aspects of entity sparseness and path sparseness.
To address the above issues, this paper proposes two novel path-based reasoning methods to solve the sparsity issues of entity and path respectively, which adopts the textual semantic information of entities and paths for MedKGC. By using the pre-trained model BERT, combining the textual semantic representations of the entities and the relationships, we model the task of symbolic reasoning in the medical KG as a numerical computing issue in textual semantic representation.
Experiments results on the publicly authoritative Chinese symptom knowledge graph demonstrated that the proposed method is significantly better than the state-of-the-art path-based knowledge graph reasoning methods, and the average performance is improved by 5.83% for all relations.
In this paper, we propose two new knowledge graph reasoning algorithms, which adopt textual semantic information of entities and paths and can effectively alleviate the sparsity problem of entities and paths in the MedKGC. As far as we know, it is the first method to use pre-trained language models and text path representations for medical knowledge reasoning. Our method can complete the impaired symptom knowledge graph in an interpretable way, and it outperforms the state-of-the-art path-based reasoning methods.
知识图谱(KGs),尤其是医学知识图谱,通常存在显著的不完整性,因此需要进行医学知识图谱补全(MedKGC)。MedKGC 可以基于 KGs 中的已有知识发现新的事实。基于路径的知识推理算法是解决该任务的最重要方法之一。由于其高性能和可解释性,这种类型的方法近年来受到了极大的关注。实际上,传统方法(如路径排序算法)将实体对之间的路径作为原子特征。然而,医学 KGs 非常稀疏,这使得难以对极其稀疏的路径特征进行有效的语义表示建模。医学 KGs 的稀疏性主要体现在实体和路径的长尾分布上。以前的方法仅考虑知识图谱路径中的上下文结构,而忽略路径中符号的文本语义。因此,由于实体稀疏性和路径稀疏性这两个方面的原因,它们的性能无法进一步提高。
为了解决上述问题,本文提出了两种新的基于路径的推理方法,分别解决实体和路径的稀疏性问题,这两种方法采用了医学 KGs 中实体和路径的文本语义信息。通过使用预训练的 BERT 模型,结合实体和关系的文本语义表示,我们将医学 KG 中的符号推理任务建模为文本语义表示中的数值计算问题。
在公开的权威中文症状知识图谱上的实验结果表明,所提出的方法明显优于基于路径的知识图推理方法,所有关系的平均性能提高了 5.83%。
在本文中,我们提出了两种新的知识图推理算法,它们采用了实体和路径的文本语义信息,可以有效地缓解 MedKGC 中实体和路径的稀疏性问题。据我们所知,这是首次使用预训练语言模型和文本路径表示进行医学知识推理的方法。我们的方法可以以可解释的方式完成受损的症状知识图谱,并且优于最先进的基于路径的推理方法。