Painter Jeffery L, Chalamalasetti Venkateswara Rao, Kassekert Raymond, Bate Andrew
GlaxoSmithKline, Durham, NC 27701, United States.
Tech Mahindra, Plano, TX 75024, United States.
JAMIA Open. 2025 Feb 8;8(1):ooaf003. doi: 10.1093/jamiaopen/ooaf003. eCollection 2025 Feb.
To enhance the accuracy of information retrieval from pharmacovigilance (PV) databases by employing Large Language Models (LLMs) to convert natural language queries (NLQs) into Structured Query Language (SQL) queries, leveraging a business context document.
We utilized OpenAI's GPT-4 model within a retrieval-augmented generation (RAG) framework, enriched with a business context document, to transform NLQs into executable SQL queries. Each NLQ was presented to the LLM randomly and independently to prevent memorization. The study was conducted in 3 phases, varying query complexity, and assessing the LLM's performance both with and without the business context document.
Our approach significantly improved NLQ-to-SQL accuracy, increasing from 8.3% with the database schema alone to 78.3% with the business context document. This enhancement was consistent across low, medium, and high complexity queries, indicating the critical role of contextual knowledge in query generation.
The integration of a business context document markedly improved the LLM's ability to generate accurate SQL queries (ie, both executable and returning semantically appropriate results). Performance achieved a maximum of 85% when high complexity queries are excluded, suggesting promise for routine deployment.
This study presents a novel approach to employing LLMs for safety data retrieval and analysis, demonstrating significant advancements in query generation accuracy. The methodology offers a framework applicable to various data-intensive domains, enhancing the accessibility of information retrieval for non-technical users.
通过利用大语言模型(LLMs)将自然语言查询(NLQs)转换为结构化查询语言(SQL)查询,并借助业务上下文文档,提高从药物警戒(PV)数据库中检索信息的准确性。
我们在检索增强生成(RAG)框架内使用了OpenAI的GPT-4模型,并辅以业务上下文文档,将NLQs转换为可执行的SQL查询。每个NLQ被随机且独立地呈现给大语言模型,以防止记忆。该研究分三个阶段进行,改变查询复杂度,并评估有无业务上下文文档时大语言模型的性能。
我们的方法显著提高了NLQ到SQL的准确性,仅使用数据库模式时为8.3%,使用业务上下文文档时提高到78.3%。这种提高在低、中、高复杂度查询中均一致,表明上下文知识在查询生成中的关键作用。
业务上下文文档的整合显著提高了大语言模型生成准确SQL查询的能力(即既可以执行又能返回语义合适的结果)。排除高复杂度查询时,性能最高达到85%,表明有常规部署的前景。
本研究提出了一种利用大语言模型进行安全数据检索和分析的新方法,展示了查询生成准确性方面的重大进展。该方法提供了一个适用于各种数据密集型领域的框架,提高了非技术用户信息检索的可及性。