Suppr超能文献

实现药物警戒证据生成自动化:利用大语言模型生成情境感知结构化查询语言。

Automating pharmacovigilance evidence generation: using large language models to produce context-aware structured query language.

作者信息

Painter Jeffery L, Chalamalasetti Venkateswara Rao, Kassekert Raymond, Bate Andrew

机构信息

GlaxoSmithKline, Durham, NC 27701, United States.

Tech Mahindra, Plano, TX 75024, United States.

出版信息

JAMIA Open. 2025 Feb 8;8(1):ooaf003. doi: 10.1093/jamiaopen/ooaf003. eCollection 2025 Feb.

Abstract

OBJECTIVE

To enhance the accuracy of information retrieval from pharmacovigilance (PV) databases by employing Large Language Models (LLMs) to convert natural language queries (NLQs) into Structured Query Language (SQL) queries, leveraging a business context document.

MATERIALS AND METHODS

We utilized OpenAI's GPT-4 model within a retrieval-augmented generation (RAG) framework, enriched with a business context document, to transform NLQs into executable SQL queries. Each NLQ was presented to the LLM randomly and independently to prevent memorization. The study was conducted in 3 phases, varying query complexity, and assessing the LLM's performance both with and without the business context document.

RESULTS

Our approach significantly improved NLQ-to-SQL accuracy, increasing from 8.3% with the database schema alone to 78.3% with the business context document. This enhancement was consistent across low, medium, and high complexity queries, indicating the critical role of contextual knowledge in query generation.

DISCUSSION

The integration of a business context document markedly improved the LLM's ability to generate accurate SQL queries (ie, both executable and returning semantically appropriate results). Performance achieved a maximum of 85% when high complexity queries are excluded, suggesting promise for routine deployment.

CONCLUSION

This study presents a novel approach to employing LLMs for safety data retrieval and analysis, demonstrating significant advancements in query generation accuracy. The methodology offers a framework applicable to various data-intensive domains, enhancing the accessibility of information retrieval for non-technical users.

摘要

目的

通过利用大语言模型(LLMs)将自然语言查询(NLQs)转换为结构化查询语言(SQL)查询,并借助业务上下文文档,提高从药物警戒(PV)数据库中检索信息的准确性。

材料与方法

我们在检索增强生成(RAG)框架内使用了OpenAI的GPT-4模型,并辅以业务上下文文档,将NLQs转换为可执行的SQL查询。每个NLQ被随机且独立地呈现给大语言模型,以防止记忆。该研究分三个阶段进行,改变查询复杂度,并评估有无业务上下文文档时大语言模型的性能。

结果

我们的方法显著提高了NLQ到SQL的准确性,仅使用数据库模式时为8.3%,使用业务上下文文档时提高到78.3%。这种提高在低、中、高复杂度查询中均一致,表明上下文知识在查询生成中的关键作用。

讨论

业务上下文文档的整合显著提高了大语言模型生成准确SQL查询的能力(即既可以执行又能返回语义合适的结果)。排除高复杂度查询时,性能最高达到85%,表明有常规部署的前景。

结论

本研究提出了一种利用大语言模型进行安全数据检索和分析的新方法,展示了查询生成准确性方面的重大进展。该方法提供了一个适用于各种数据密集型领域的框架,提高了非技术用户信息检索的可及性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf7d/11806702/85c58ef98c6c/ooaf003f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验