实现药物警戒证据生成自动化：利用大语言模型生成情境感知结构化查询语言。

Automating pharmacovigilance evidence generation: using large language models to produce context-aware structured query language.

作者信息

Painter Jeffery L, Chalamalasetti Venkateswara Rao, Kassekert Raymond, Bate Andrew

机构信息

GlaxoSmithKline, Durham, NC 27701, United States.

Tech Mahindra, Plano, TX 75024, United States.

出版信息

JAMIA Open. 2025 Feb 8;8(1):ooaf003. doi: 10.1093/jamiaopen/ooaf003. eCollection 2025 Feb.

DOI:10.1093/jamiaopen/ooaf003

PMID:39926164

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11806702/

Abstract

OBJECTIVE

To enhance the accuracy of information retrieval from pharmacovigilance (PV) databases by employing Large Language Models (LLMs) to convert natural language queries (NLQs) into Structured Query Language (SQL) queries, leveraging a business context document.

MATERIALS AND METHODS

We utilized OpenAI's GPT-4 model within a retrieval-augmented generation (RAG) framework, enriched with a business context document, to transform NLQs into executable SQL queries. Each NLQ was presented to the LLM randomly and independently to prevent memorization. The study was conducted in 3 phases, varying query complexity, and assessing the LLM's performance both with and without the business context document.

RESULTS

Our approach significantly improved NLQ-to-SQL accuracy, increasing from 8.3% with the database schema alone to 78.3% with the business context document. This enhancement was consistent across low, medium, and high complexity queries, indicating the critical role of contextual knowledge in query generation.

DISCUSSION

The integration of a business context document markedly improved the LLM's ability to generate accurate SQL queries (ie, both executable and returning semantically appropriate results). Performance achieved a maximum of 85% when high complexity queries are excluded, suggesting promise for routine deployment.

CONCLUSION

This study presents a novel approach to employing LLMs for safety data retrieval and analysis, demonstrating significant advancements in query generation accuracy. The methodology offers a framework applicable to various data-intensive domains, enhancing the accessibility of information retrieval for non-technical users.

摘要

目的

通过利用大语言模型（LLMs）将自然语言查询（NLQs）转换为结构化查询语言（SQL）查询，并借助业务上下文文档，提高从药物警戒（PV）数据库中检索信息的准确性。

材料与方法

我们在检索增强生成（RAG）框架内使用了OpenAI的GPT-4模型，并辅以业务上下文文档，将NLQs转换为可执行的SQL查询。每个NLQ被随机且独立地呈现给大语言模型，以防止记忆。该研究分三个阶段进行，改变查询复杂度，并评估有无业务上下文文档时大语言模型的性能。

结果

我们的方法显著提高了NLQ到SQL的准确性，仅使用数据库模式时为8.3%，使用业务上下文文档时提高到78.3%。这种提高在低、中、高复杂度查询中均一致，表明上下文知识在查询生成中的关键作用。

讨论

业务上下文文档的整合显著提高了大语言模型生成准确SQL查询的能力（即既可以执行又能返回语义合适的结果）。排除高复杂度查询时，性能最高达到85%，表明有常规部署的前景。

结论

本研究提出了一种利用大语言模型进行安全数据检索和分析的新方法，展示了查询生成准确性方面的重大进展。该方法提供了一个适用于各种数据密集型领域的框架，提高了非技术用户信息检索的可及性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cf7d/11806702/85c58ef98c6c/ooaf003f1.jpg

相似文献

Automating pharmacovigilance evidence generation: using large language models to produce context-aware structured query language.实现药物警戒证据生成自动化：利用大语言模型生成情境感知结构化查询语言。

JAMIA Open. 2025 Feb 8;8(1):ooaf003. doi: 10.1093/jamiaopen/ooaf003. eCollection 2025 Feb.

Leveraging Large Language Models for Precision Monitoring of Chemotherapy-Induced Toxicities: A Pilot Study with Expert Comparisons and Future Directions.利用大语言模型进行化疗诱导毒性的精准监测：一项专家比较及未来方向的试点研究

Cancers (Basel). 2024 Aug 12;16(16):2830. doi: 10.3390/cancers16162830.

Enhancement of the Performance of Large Language Models in Diabetes Education through Retrieval-Augmented Generation: Comparative Study.通过检索增强生成提高大语言模型在糖尿病教育中的性能：比较研究

J Med Internet Res. 2024 Nov 8;26:e58041. doi: 10.2196/58041.

Improving Dietary Supplement Information Retrieval: Development of a Retrieval-Augmented Generation System With Large Language Models.改善膳食补充剂信息检索：利用大语言模型开发检索增强生成系统

J Med Internet Res. 2025 Mar 19;27:e67677. doi: 10.2196/67677.

Optimization of hepatological clinical guidelines interpretation by large language models: a retrieval augmented generation-based framework.基于检索增强生成框架的大语言模型对肝病临床指南解读的优化

NPJ Digit Med. 2024 Apr 23;7(1):102. doi: 10.1038/s41746-024-01091-y.

Development and Evaluation of a Retrieval-Augmented Large Language Model Framework for Ophthalmology.开发和评估眼科检索增强型大型语言模型框架。

JAMA Ophthalmol. 2024 Sep 1;142(9):798-805. doi: 10.1001/jamaophthalmol.2024.2513.

Use of Retrieval-Augmented Large Language Model for COVID-19 Fact-Checking: Development and Usability Study.使用检索增强大语言模型进行COVID-19事实核查：开发与可用性研究。

J Med Internet Res. 2025 Apr 30;27:e66098. doi: 10.2196/66098.

Optimizing biomedical information retrieval with a keyword frequency-driven prompt enhancement strategy.基于关键词频率驱动的提示增强策略优化生物医学信息检索

BMC Bioinformatics. 2024 Aug 27;25(1):281. doi: 10.1186/s12859-024-05902-7.

MedT5SQL: a transformers-based large language model for text-to-SQL conversion in the healthcare domain.MedT5SQL：一种基于Transformer的大型语言模型，用于医疗领域的文本到SQL转换。

Front Big Data. 2024 Jun 26;7:1371680. doi: 10.3389/fdata.2024.1371680. eCollection 2024.

Improving accuracy of GPT-3/4 results on biomedical data using a retrieval-augmented language model.使用检索增强语言模型提高GPT-3/4在生物医学数据上的结果准确性。

PLOS Digit Health. 2024 Aug 21;3(8):e0000568. doi: 10.1371/journal.pdig.0000568. eCollection 2024 Aug.

引用本文的文献

Retrieval augmented generation for large language models in healthcare: A systematic review.医疗保健领域大语言模型的检索增强生成：一项系统综述。

PLOS Digit Health. 2025 Jun 11;4(6):e0000877. doi: 10.1371/journal.pdig.0000877. eCollection 2025 Jun.

Perspective review: Will generative AI make common data models obsolete in future analyses of distributed data networks?观点综述：生成式人工智能会使通用数据模型在分布式数据网络的未来分析中过时吗？

Ther Adv Drug Saf. 2025 Apr 21;16:20420986251332743. doi: 10.1177/20420986251332743. eCollection 2025.

Interplay of Spontaneous Reporting and Longitudinal Healthcare Databases for Signal Management: Position Statement from the Real-World Evidence and Big Data Special Interest Group of the International Society of Pharmacovigilance.自发报告与纵向医疗保健数据库在信号管理中的相互作用：国际药物警戒协会真实世界证据与大数据特别兴趣小组的立场声明

Drug Saf. 2025 Apr 13. doi: 10.1007/s40264-025-01548-3.

本文引用的文献

Clinical Information Retrieval: A Literature Review.临床信息检索：文献综述

J Healthc Inform Res. 2024 Jan 23;8(2):313-352. doi: 10.1007/s41666-024-00159-4. eCollection 2024 Jun.

Two complementary AI approaches for predicting UMLS semantic group assignment: heuristic reasoning and deep learning.两种互补的 AI 方法用于预测 UMLS 语义组分配：启发式推理和深度学习。

J Am Med Inform Assoc. 2023 Nov 17;30(12):1887-1894. doi: 10.1093/jamia/ocad152.

Pharmacovigilance: An Overview.药物警戒：概述。

Clin Ther. 2018 Dec;40(12):1991-2004. doi: 10.1016/j.clinthera.2018.07.012. Epub 2018 Aug 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

实现药物警戒证据生成自动化：利用大语言模型生成情境感知结构化查询语言。

Automating pharmacovigilance evidence generation: using large language models to produce context-aware structured query language.

作者信息

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

DISCUSSION

CONCLUSION

目的

材料与方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献