Reason Tim, Benbow Emma, Langham Julia, Gimblett Andy, Klijn Sven L, Malcolm Bill
Estima Scientific, Mediaworks, 191 Wood Lane, London, W12 7FP, UK.
Bristol Myers Squibb, Princeton, NJ, USA.
Pharmacoecon Open. 2024 Mar;8(2):205-220. doi: 10.1007/s41669-024-00476-9. Epub 2024 Feb 10.
The emergence of artificial intelligence, capable of human-level performance on some tasks, presents an opportunity to revolutionise development of systematic reviews and network meta-analyses (NMAs). In this pilot study, we aim to assess use of a large-language model (LLM, Generative Pre-trained Transformer 4 [GPT-4]) to automatically extract data from publications, write an R script to conduct an NMA and interpret the results.
We considered four case studies involving binary and time-to-event outcomes in two disease areas, for which an NMA had previously been conducted manually. For each case study, a Python script was developed that communicated with the LLM via application programming interface (API) calls. The LLM was prompted to extract relevant data from publications, to create an R script to be used to run the NMA and then to produce a small report describing the analysis.
The LLM had a > 99% success rate of accurately extracting data across 20 runs for each case study and could generate R scripts that could be run end-to-end without human input. It also produced good quality reports describing the disease area, analysis conducted, results obtained and a correct interpretation of the results.
This study provides a promising indication of the feasibility of using current generation LLMs to automate data extraction, code generation and NMA result interpretation, which could result in significant time savings and reduce human error. This is provided that routine technical checks are performed, as recommend for human-conducted analyses. Whilst not currently 100% consistent, LLMs are likely to improve with time.
人工智能的出现使其在某些任务上能够达到人类水平的表现,这为系统性综述和网状Meta分析(NMA)的发展带来了变革的机遇。在这项试点研究中,我们旨在评估使用大型语言模型(LLM,生成式预训练变换器4 [GPT-4])自动从出版物中提取数据、编写用于进行NMA的R脚本并解释结果。
我们考虑了四个案例研究,涉及两个疾病领域的二元结局和事件发生时间结局,此前已针对这些案例手动进行了NMA。对于每个案例研究,开发了一个通过应用程序编程接口(API)调用与LLM通信的Python脚本。提示LLM从出版物中提取相关数据,创建一个用于运行NMA的R脚本,然后生成一份描述该分析的简短报告。
对于每个案例研究,LLM在20次运行中准确提取数据的成功率超过99%,并且可以生成无需人工输入即可端到端运行的R脚本。它还生成了高质量的报告,描述了疾病领域、进行的分析、获得的结果以及对结果的正确解释。
本研究为使用当前一代LLM实现数据提取、代码生成和NMA结果解释自动化的可行性提供了有前景的迹象,这可能会显著节省时间并减少人为错误。前提是要像对人工进行的分析那样进行常规技术检查。虽然目前LLM并非100%一致,但随着时间的推移可能会有所改进。