用自然语言查询知识图谱。

Querying knowledge graphs in natural language.

作者信息

Liang Shiqi, Stockinger Kurt, de Farias Tarcisio Mendes, Anisimova Maria, Gil Manuel

机构信息

ETH Swiss Federal Institute of Technology, Rämistrasse 101, 8092 Zurich, Switzerland.

Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur, Switzerland.

出版信息

J Big Data. 2021;8(1):3. doi: 10.1186/s40537-020-00383-w. Epub 2021 Jan 6.

DOI:10.1186/s40537-020-00383-w

PMID:33489717

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7799375/

Abstract

Knowledge graphs are a powerful concept for querying large amounts of data. These knowledge graphs are typically enormous and are often not easily accessible to end-users because they require specialized knowledge in query languages such as SPARQL. Moreover, end-users need a deep understanding of the structure of the underlying data models often based on the Resource Description Framework (RDF). This drawback has led to the development of Question-Answering (QA) systems that enable end-users to express their information needs in natural language. While existing systems simplify user access, there is still room for improvement in the accuracy of these systems. In this paper we propose a new QA system for translating natural language questions into SPARQL queries. The key idea is to break up the translation process into 5 smaller, more manageable sub-tasks and use ensemble machine learning methods as well as Tree-LSTM-based neural network models to automatically learn and translate a natural language question into a SPARQL query. The performance of our proposed QA system is empirically evaluated using the two renowned benchmarks-the 7th Question Answering over Linked Data Challenge (QALD-7) and the Large-Scale Complex Question Answering Dataset (LC-QuAD). Experimental results show that our QA system outperforms the state-of-art systems by 15% on the QALD-7 dataset and by 48% on the LC-QuAD dataset, respectively. In addition, we make our source code available.

摘要

知识图谱是用于查询大量数据的强大概念。这些知识图谱通常非常庞大，终端用户往往难以访问，因为它们需要诸如SPARQL之类查询语言的专业知识。此外，终端用户通常需要深入了解通常基于资源描述框架（RDF）的底层数据模型的结构。这一缺点促使了问答（QA）系统的发展，使终端用户能够用自然语言表达他们的信息需求。虽然现有系统简化了用户访问，但这些系统的准确性仍有改进空间。在本文中，我们提出了一种新的QA系统，用于将自然语言问题翻译成SPARQL查询。关键思想是将翻译过程分解为5个更小、更易于管理的子任务，并使用集成机器学习方法以及基于树长短期记忆网络（Tree-LSTM）的神经网络模型，自动将自然语言问题学习并翻译成SPARQL查询。我们提出的QA系统的性能通过两个著名的基准进行实证评估——第七届链接数据问答挑战赛（QALD-7）和大规模复杂问答数据集（LC-QuAD）。实验结果表明，我们的QA系统在QALD-7数据集上比现有最先进的系统性能高出15%，在LC-QuAD数据集上高出48%。此外，我们还提供了源代码。