基于深度学习的 RDF 图搜索方法。

Deep learning based searching approach for RDF graphs.

机构信息

College of Computer Science & Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China.

出版信息

PLoS One. 2020 Mar 23;15(3):e0230500. doi: 10.1371/journal.pone.0230500. eCollection 2020.

DOI:10.1371/journal.pone.0230500

PMID:32203547

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7089531/

Abstract

The Internet is a remarkably complex technical system. Its rapid growth has also brought technical issues such as problems to information retrieval. Search engines retrieve requested information based on the provided keywords. Consequently, it is difficult to accurately find the required information without understanding the syntax and semantics of the content. Multiple approaches are proposed to resolve this problem by employing the semantic web and linked data techniques. Such approaches serialize the content using the Resource Description Framework (RDF) and execute the queries using SPARQL to resolve the problem. However, an exact match between RDF content and query structure is required. Although, it improves the keyword-based search; however, it does not provide probabilistic reasoning to find the semantic relationship between the queries and their results. From this perspective, in this paper, we propose a deep learning-based approach for searching RDF graphs. The proposed approach treats document requests as a classification problem. First, we preprocess the RDF graphs to convert them into N-Triples format. Second, bag-of-words (BOW) and word2vec feature modeling techniques are combined for a novel deep representation of RDF graphs. The attention mechanism enables the proposed approach to understand the semantic between RDF graphs. Third, we train a convolutional neural network for the accurate retrieval of RDF graphs using the deep representation. We employ 10-fold cross-validation to evaluate the proposed approach. The results show that the proposed approach is accurate and surpasses the state-of-the-art. The average accuracy, precision, recall, and f-measure are up to 97.12%, 98.17%, 95.56%, and 96.85%, respectively.

摘要

互联网是一个非常复杂的技术系统。它的快速发展也带来了信息检索等技术问题。搜索引擎根据提供的关键字检索请求的信息。因此，如果不理解内容的语法和语义，就很难准确找到所需的信息。为了解决这个问题，提出了多种方法，这些方法采用语义网和链接数据技术。这些方法使用资源描述框架 (RDF) 对内容进行序列化，并使用 SPARQL 执行查询以解决问题。但是，需要 RDF 内容和查询结构之间的精确匹配。虽然它改进了基于关键字的搜索；但是，它没有提供概率推理来查找查询与其结果之间的语义关系。从这个角度来看，在本文中，我们提出了一种基于深度学习的 RDF 图搜索方法。所提出的方法将文档请求视为分类问题。首先，我们预处理 RDF 图，将其转换为 N-Triples 格式。其次，将词袋 (BOW) 和 word2vec 特征建模技术相结合，为 RDF 图提供新颖的深度表示。注意力机制使所提出的方法能够理解 RDF 图之间的语义。第三，我们使用深度表示训练卷积神经网络，以准确检索 RDF 图。我们采用 10 倍交叉验证来评估所提出的方法。结果表明，所提出的方法是准确的，并且超过了最新水平。平均准确率、精度、召回率和 F1 分数分别高达 97.12%、98.17%、95.56%和 96.85%。