评估提示工程在知识图谱问答中的有效性。

Evaluating the effectiveness of prompt engineering for knowledge graph question answering.

作者信息

Kosten Catherine, Nooralahzadeh Farhad, Stockinger Kurt

机构信息

School of Engineering, Institute of Computer Science, Intelligent Information Systems Research Group, Zurich University of Applied Sciences, Winterthur, Switzerland.

出版信息

Front Artif Intell. 2025 Jan 13;7:1454258. doi: 10.3389/frai.2024.1454258. eCollection 2024.

DOI:10.3389/frai.2024.1454258

PMID:39871862

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11770024/

Abstract

Many different methods for prompting large language models have been developed since the emergence of OpenAI's ChatGPT in November 2022. In this work, we evaluate six different few-shot prompting methods. The first set of experiments evaluates three frameworks that focus on the quantity or type of shots in a prompt: a baseline method with a simple prompt and a small number of shots, random few-shot prompting with 10, 20, and 30 shots, and similarity-based few-shot prompting. The second set of experiments target optimizing the prompt or enhancing shots through Large Language Model (LLM)-generated explanations, using three prompting frameworks: Explain then Translate, Question Decomposition Meaning Representation, and Optimization by Prompting. We evaluate these six prompting methods on the newly created Spider4SPARQL benchmark, as it is the most complex SPARQL-based Knowledge Graph Question Answering (KGQA) benchmark to date. Across the various prompting frameworks used, the commercial model is unable to achieve a score over 51%, indicating that KGQA, especially for complex queries, with multiple hops, set operations and filters remains a challenging task for LLMs. Our experiments find that the most successful prompting framework for KGQA is a simple prompt combined with an ontology and five random shots.

摘要

自2022年11月OpenAI的ChatGPT出现以来，已经开发了许多不同的方法来提示大语言模型。在这项工作中，我们评估了六种不同的少样本提示方法。第一组实验评估了三个专注于提示中样本数量或类型的框架：一种具有简单提示和少量样本的基线方法、具有10、20和30个样本的随机少样本提示以及基于相似度的少样本提示。第二组实验旨在通过大语言模型（LLM）生成的解释来优化提示或增强样本，使用了三个提示框架：先解释后翻译、问题分解意义表示以及通过提示进行优化。我们在新创建的Spider4SPARQL基准上评估这六种提示方法，因为它是迄今为止最复杂的基于SPARQL的知识图谱问答（KGQA）基准。在使用的各种提示框架中，商业模型无法获得超过51%的分数，这表明KGQA，特别是对于具有多跳、集合操作和过滤器的复杂查询，对大语言模型来说仍然是一项具有挑战性的任务。我们的实验发现，对于KGQA最成功的提示框架是结合本体和五个随机样本的简单提示。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

评估提示工程在知识图谱问答中的有效性。

Evaluating the effectiveness of prompt engineering for knowledge graph question answering.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

评估提示工程在知识图谱问答中的有效性。

Evaluating the effectiveness of prompt engineering for knowledge graph question answering.

作者信息

机构信息

出版信息

相似文献

本文引用的文献