人工智能实现网络荟萃分析自动化：四项评估大语言模型潜在应用的案例研究

Artificial Intelligence to Automate Network Meta-Analyses: Four Case Studies to Evaluate the Potential Application of Large Language Models.

作者信息

Reason Tim, Benbow Emma, Langham Julia, Gimblett Andy, Klijn Sven L, Malcolm Bill

机构信息

Estima Scientific, Mediaworks, 191 Wood Lane, London, W12 7FP, UK.

Bristol Myers Squibb, Princeton, NJ, USA.

出版信息

Pharmacoecon Open. 2024 Mar;8(2):205-220. doi: 10.1007/s41669-024-00476-9. Epub 2024 Feb 10.

DOI:10.1007/s41669-024-00476-9

PMID:38340277

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10884375/

Abstract

BACKGROUND

The emergence of artificial intelligence, capable of human-level performance on some tasks, presents an opportunity to revolutionise development of systematic reviews and network meta-analyses (NMAs). In this pilot study, we aim to assess use of a large-language model (LLM, Generative Pre-trained Transformer 4 [GPT-4]) to automatically extract data from publications, write an R script to conduct an NMA and interpret the results.

METHODS

We considered four case studies involving binary and time-to-event outcomes in two disease areas, for which an NMA had previously been conducted manually. For each case study, a Python script was developed that communicated with the LLM via application programming interface (API) calls. The LLM was prompted to extract relevant data from publications, to create an R script to be used to run the NMA and then to produce a small report describing the analysis.

RESULTS

The LLM had a > 99% success rate of accurately extracting data across 20 runs for each case study and could generate R scripts that could be run end-to-end without human input. It also produced good quality reports describing the disease area, analysis conducted, results obtained and a correct interpretation of the results.

CONCLUSIONS

This study provides a promising indication of the feasibility of using current generation LLMs to automate data extraction, code generation and NMA result interpretation, which could result in significant time savings and reduce human error. This is provided that routine technical checks are performed, as recommend for human-conducted analyses. Whilst not currently 100% consistent, LLMs are likely to improve with time.

摘要

背景

人工智能的出现使其在某些任务上能够达到人类水平的表现，这为系统性综述和网状Meta分析（NMA）的发展带来了变革的机遇。在这项试点研究中，我们旨在评估使用大型语言模型（LLM，生成式预训练变换器4 [GPT-4]）自动从出版物中提取数据、编写用于进行NMA的R脚本并解释结果。

方法

我们考虑了四个案例研究，涉及两个疾病领域的二元结局和事件发生时间结局，此前已针对这些案例手动进行了NMA。对于每个案例研究，开发了一个通过应用程序编程接口（API）调用与LLM通信的Python脚本。提示LLM从出版物中提取相关数据，创建一个用于运行NMA的R脚本，然后生成一份描述该分析的简短报告。

结果

对于每个案例研究，LLM在20次运行中准确提取数据的成功率超过99%，并且可以生成无需人工输入即可端到端运行的R脚本。它还生成了高质量的报告，描述了疾病领域、进行的分析、获得的结果以及对结果的正确解释。

结论

本研究为使用当前一代LLM实现数据提取、代码生成和NMA结果解释自动化的可行性提供了有前景的迹象，这可能会显著节省时间并减少人为错误。前提是要像对人工进行的分析那样进行常规技术检查。虽然目前LLM并非100%一致，但随着时间的推移可能会有所改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f612/10884375/bac5845cc2f7/41669_2024_476_Fig1_HTML.jpg

相似文献

Artificial Intelligence to Automate Network Meta-Analyses: Four Case Studies to Evaluate the Potential Application of Large Language Models.

Pharmacoecon Open. 2024 Mar;8(2):205-220. doi: 10.1007/s41669-024-00476-9. Epub 2024 Feb 10.

Artificial Intelligence to Automate Health Economic Modelling: A Case Study to Evaluate the Potential Application of Large Language Models.

Pharmacoecon Open. 2024 Mar;8(2):191-203. doi: 10.1007/s41669-024-00477-8. Epub 2024 Feb 10.

[A Guide to Network Meta-Analysis Using Generative AI and No-Code Tools].

Hu Li Za Zhi. 2024 Oct;71(5):29-35. doi: 10.6224/JN.202410_71(5).05.

Large Language Models Can Enable Inductive Thematic Analysis of a Social Media Corpus in a Single Prompt: Human Validation Study.

JMIR Infodemiology. 2024 Aug 29;4:e59641. doi: 10.2196/59641.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study.

J Med Internet Res. 2024 Jan 12;26:e48996. doi: 10.2196/48996.

The future of Cochrane Neonatal.

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

A Generative Pretrained Transformer (GPT)-Powered Chatbot as a Simulated Patient to Practice History Taking: Prospective, Mixed Methods Study.

JMIR Med Educ. 2024 Jan 16;10:e53961. doi: 10.2196/53961.

ChatGPT and large language model (LLM) chatbots: The current state of acceptability and a proposal for guidelines on utilization in academic medicine.

J Pediatr Urol. 2023 Oct;19(5):598-604. doi: 10.1016/j.jpurol.2023.05.018. Epub 2023 Jun 2.

引用本文的文献

Artificial intelligence across the cancer care continuum.

Cancer. 2025 Aug 15;131(16):e70050. doi: 10.1002/cncr.70050.

Protective multi-stressor interactions in the Anthropocene: Key considerations for investigating cross-tolerance in a conservation context.

Conserv Physiol. 2025 Jul 30;13(1):coaf052. doi: 10.1093/conphys/coaf052. eCollection 2025.

Integration of Generative AI with Human Expertise in HEOR: A Hybrid Intelligence Framework.

Adv Ther. 2025 Jun 25. doi: 10.1007/s12325-025-03273-w.

Streamlining systematic reviews with large language models using prompt engineering and retrieval augmented generation.

BMC Med Res Methodol. 2025 May 10;25(1):130. doi: 10.1186/s12874-025-02583-5.

The emergence of large language models as tools in literature reviews: a large language model-assisted systematic review.

J Am Med Inform Assoc. 2025 Jun 1;32(6):1071-1086. doi: 10.1093/jamia/ocaf063.

Using Generative Artificial Intelligence in Health Economics and Outcomes Research: A Primer on Techniques and Breakthroughs.

Pharmacoecon Open. 2025 Apr 29. doi: 10.1007/s41669-025-00580-4.

Large Language Models and Their Applications in Drug Discovery and Development: A Primer.

Clin Transl Sci. 2025 Apr;18(4):e70205. doi: 10.1111/cts.70205.

Methodologies for network meta-analysis of randomised controlled trials in pain, anaesthesia, and perioperative medicine: a narrative review.

Br J Anaesth. 2025 Apr;134(4):1029-1040. doi: 10.1016/j.bja.2024.12.039. Epub 2025 Feb 19.

Collaborative large language models for automated data extraction in living systematic reviews.

J Am Med Inform Assoc. 2025 Apr 1;32(4):638-647. doi: 10.1093/jamia/ocae325.

Generative Artificial Intelligence for Health Technology Assessment: Opportunities, Challenges, and Policy Considerations: An ISPOR Working Group Report.

Value Health. 2025 Feb;28(2):175-183. doi: 10.1016/j.jval.2024.10.3846. Epub 2024 Nov 12.

本文引用的文献

Health technology assessment for cancer medicines across the G7 countries and Oceania: an international, cross-sectional study.

Lancet Oncol. 2023 Jun;24(6):624-635. doi: 10.1016/S1470-2045(23)00175-4.

Secukinumab in moderate-to-severe hidradenitis suppurativa (SUNSHINE and SUNRISE): week 16 and week 52 results of two identical, multicentre, randomised, placebo-controlled, double-blind phase 3 trials.

Lancet. 2023 Mar 4;401(10378):747-761. doi: 10.1016/S0140-6736(23)00022-3. Epub 2023 Feb 3.

Stan: A Probabilistic Programming Language.

J Stat Softw. 2017;76. doi: 10.18637/jss.v076.i01. Epub 2017 Jan 11.

What kind of systematic review should I conduct? A proposed typology and guidance for systematic reviewers in the medical and health sciences.

BMC Med Res Methodol. 2018 Jan 10;18(1):5. doi: 10.1186/s12874-017-0468-4.

Frequency of data extraction errors and methods to increase data extraction quality: a methodological review.

BMC Med Res Methodol. 2017 Nov 28;17(1):152. doi: 10.1186/s12874-017-0431-4.

Using health technology assessment to assess the value of new medicines: results of a systematic review and expert consultation across eight European countries.

Eur J Health Econ. 2018 Jan;19(1):123-152. doi: 10.1007/s10198-017-0871-0. Epub 2017 Mar 16.

Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry.

BMJ Open. 2017 Feb 27;7(2):e012545. doi: 10.1136/bmjopen-2016-012545.

Network meta-analysis: an introduction for clinicians.

Intern Emerg Med. 2017 Feb;12(1):103-111. doi: 10.1007/s11739-016-1583-7. Epub 2016 Dec 2.

Pembrolizumab versus docetaxel for previously treated, PD-L1-positive, advanced non-small-cell lung cancer (KEYNOTE-010): a randomised controlled trial.

Lancet. 2016 Apr 9;387(10027):1540-1550. doi: 10.1016/S0140-6736(15)01281-7. Epub 2015 Dec 19.

A process for assessing the feasibility of a network meta-analysis: a case study of everolimus in combination with hormonal therapy versus chemotherapy for advanced breast cancer.

BMC Med. 2014 Jun 5;12:93. doi: 10.1186/1741-7015-12-93.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

人工智能实现网络荟萃分析自动化：四项评估大语言模型潜在应用的案例研究

Artificial Intelligence to Automate Network Meta-Analyses: Four Case Studies to Evaluate the Potential Application of Large Language Models.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献