Tian Hao, Zhang Xiaoxiong, Wang Yuhan, Zeng Daojian
School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, China.
The Sixty-Third Research Institute, National University of Defense Technology, Nanjing 210007, China.
Entropy (Basel). 2022 Oct 20;24(10):1495. doi: 10.3390/e24101495.
Knowledge graph completion is an important technology for supplementing knowledge graphs and improving data quality. However, the existing knowledge graph completion methods ignore the features of triple relations, and the introduced entity description texts are long and redundant. To address these problems, this study proposes a multi-task learning and improved TextRank for knowledge graph completion (MIT-KGC) model. The key contexts are first extracted from redundant entity descriptions using the improved TextRank algorithm. Then, a lite bidirectional encoder representations from transformers (ALBERT) is used as the text encoder to reduce the parameters of the model. Subsequently, the multi-task learning method is utilized to fine-tune the model by effectively integrating the entity and relation features. Based on the datasets of WN18RR, FB15k-237, and DBpedia50k, experiments were conducted with the proposed model and the results showed that, compared with traditional methods, the mean rank (MR), top 10 hit ratio (Hit@10), and top three hit ratio (Hit@3) were enhanced by 38, 1.3%, and 1.9%, respectively, on WN18RR. Additionally, the MR and Hit@10 were increased by 23 and 0.7%, respectively, on FB15k-237. The model also improved the Hit@3 and the top one hit ratio (Hit@1) by 3.1% and 1.5% on the dataset DBpedia50k, respectively, verifying the validity of the model.
知识图谱补全是一种用于补充知识图谱和提高数据质量的重要技术。然而,现有的知识图谱补全方法忽略了三元关系的特征,并且引入的实体描述文本冗长且冗余。为了解决这些问题,本研究提出了一种用于知识图谱补全的多任务学习与改进TextRank(MIT-KGC)模型。首先使用改进的TextRank算法从冗余的实体描述中提取关键上下文。然后,使用轻量级双向变换器编码器表征(ALBERT)作为文本编码器以减少模型参数。随后,利用多任务学习方法通过有效整合实体和关系特征来对模型进行微调。基于WN18RR、FB15k - 237和DBpedia50k数据集,使用所提出的模型进行了实验,结果表明,与传统方法相比,在WN18RR上,平均排名(MR)、前10命中率(Hit@10)和前3命中率(Hit@3)分别提高了38、1.3%和1.9%。此外,在FB15k - 237上,MR和Hit@10分别提高了23和0.7%。该模型在数据集DBpedia50k上还分别将Hit@3和前1命中率(Hit@1)提高了3.1%和1.5%,验证了模型的有效性。