人工智能实现健康经济建模自动化：评估大语言模型潜在应用的案例研究

Artificial Intelligence to Automate Health Economic Modelling: A Case Study to Evaluate the Potential Application of Large Language Models.

作者信息

Reason Tim, Rawlinson William, Langham Julia, Gimblett Andy, Malcolm Bill, Klijn Sven

机构信息

Estima Scientific, Mediaworks, 191 Wood Ln, London, W12 7FP, UK.

Bristol Myers Squibb, Uxbridge, UK.

出版信息

Pharmacoecon Open. 2024 Mar;8(2):191-203. doi: 10.1007/s41669-024-00477-8. Epub 2024 Feb 10.

DOI:10.1007/s41669-024-00477-8

PMID:38340276

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10884386/

Abstract

BACKGROUND

Current generation large language models (LLMs) such as Generative Pre-Trained Transformer 4 (GPT-4) have achieved human-level performance on many tasks including the generation of computer code based on textual input. This study aimed to assess whether GPT-4 could be used to automatically programme two published health economic analyses.

METHODS

The two analyses were partitioned survival models evaluating interventions in non-small cell lung cancer (NSCLC) and renal cell carcinoma (RCC). We developed prompts which instructed GPT-4 to programme the NSCLC and RCC models in R, and which provided descriptions of each model's methods, assumptions and parameter values. The results of the generated scripts were compared to the published values from the original, human-programmed models. The models were replicated 15 times to capture variability in GPT-4's output.

RESULTS

GPT-4 fully replicated the NSCLC model with high accuracy: 100% (15/15) of the artificial intelligence (AI)-generated NSCLC models were error-free or contained a single minor error, and 93% (14/15) were completely error-free. GPT-4 closely replicated the RCC model, although human intervention was required to simplify an element of the model design (one of the model's fifteen input calculations) because it used too many sequential steps to be implemented in a single prompt. With this simplification, 87% (13/15) of the AI-generated RCC models were error-free or contained a single minor error, and 60% (9/15) were completely error-free. Error-free model scripts replicated the published incremental cost-effectiveness ratios to within 1%.

CONCLUSION

This study provides a promising indication that GPT-4 can have practical applications in the automation of health economic model construction. Potential benefits include accelerated model development timelines and reduced costs of development. Further research is necessary to explore the generalisability of LLM-based automation across a larger sample of models.

摘要

背景

当前一代的大型语言模型（LLMs），如生成式预训练变换器4（GPT-4），在许多任务上都达到了人类水平的表现，包括根据文本输入生成计算机代码。本研究旨在评估GPT-4是否可用于自动编写两项已发表的卫生经济分析程序。

方法

这两项分析是评估非小细胞肺癌（NSCLC）和肾细胞癌（RCC）干预措施的分区生存模型。我们开发了提示，指导GPT-4在R语言中编写NSCLC和RCC模型的程序，并提供每个模型的方法、假设和参数值的描述。将生成脚本的结果与原始人工编写模型的已发表值进行比较。对模型进行了15次复制，以捕捉GPT-4输出的变异性。

结果

GPT-4以高精度完全复制了NSCLC模型：人工智能（AI）生成的NSCLC模型中有100%（15/15）无错误或包含一个小错误，93%（14/15）完全无错误。GPT-4紧密复制了RCC模型，尽管需要人工干预来简化模型设计的一个元素（模型的十五个输入计算之一），因为它使用了太多顺序步骤，无法在单个提示中实现。经过这种简化，AI生成的RCC模型中有87%（13/15）无错误或包含一个小错误，60%（9/15）完全无错误。无错误的模型脚本将已发表的增量成本效益比复制到1%以内。

结论

本研究提供了一个有前景的迹象，表明GPT-4可在卫生经济模型构建自动化中具有实际应用。潜在的好处包括加快模型开发时间表和降低开发成本。有必要进行进一步研究，以探索基于大型语言模型的自动化在更大模型样本中的通用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6cf/10884386/5d1c9a59f93b/41669_2024_477_Fig1_HTML.jpg

相似文献

Artificial Intelligence to Automate Health Economic Modelling: A Case Study to Evaluate the Potential Application of Large Language Models.

Pharmacoecon Open. 2024 Mar;8(2):191-203. doi: 10.1007/s41669-024-00477-8. Epub 2024 Feb 10.

Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.

J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.

Can large language models replace humans in systematic reviews? Evaluating GPT-4's efficacy in screening and extracting data from peer-reviewed and grey literature in multiple languages.

Res Synth Methods. 2024 Jul;15(4):616-626. doi: 10.1002/jrsm.1715. Epub 2024 Mar 14.

Large Language Models for Therapy Recommendations Across 3 Clinical Specialties: Comparative Study.

J Med Internet Res. 2023 Oct 30;25:e49324. doi: 10.2196/49324.

Evaluating the Efficacy of ChatGPT in Navigating the Spanish Medical Residency Entrance Examination (MIR): Promising Horizons for AI in Clinical Medicine.

Clin Pract. 2023 Nov 20;13(6):1460-1487. doi: 10.3390/clinpract13060130.

Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.

JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.

Artificial Intelligence to Automate Network Meta-Analyses: Four Case Studies to Evaluate the Potential Application of Large Language Models.

Pharmacoecon Open. 2024 Mar;8(2):205-220. doi: 10.1007/s41669-024-00476-9. Epub 2024 Feb 10.

Comparative Evaluation of LLMs in Clinical Oncology.

NEJM AI. 2024 May;1(5). doi: 10.1056/aioa2300151. Epub 2024 Apr 16.

GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions.

World Neurosurg. 2023 Nov;179:e160-e165. doi: 10.1016/j.wneu.2023.08.042. Epub 2023 Aug 18.

Can AI Think Like a Plastic Surgeon? Evaluating GPT-4's Clinical Judgment in Reconstructive Procedures of the Upper Extremity.

Plast Reconstr Surg Glob Open. 2023 Dec 13;11(12):e5471. doi: 10.1097/GOX.0000000000005471. eCollection 2023 Dec.

引用本文的文献

Integration of Generative AI with Human Expertise in HEOR: A Hybrid Intelligence Framework.

Adv Ther. 2025 Jun 25. doi: 10.1007/s12325-025-03273-w.

Ethical Challenges and Opportunities of AI in End-of-Life Palliative Care: Integrative Review.

Interact J Med Res. 2025 May 14;14:e73517. doi: 10.2196/73517.

Using Generative Artificial Intelligence in Health Economics and Outcomes Research: A Primer on Techniques and Breakthroughs.

Pharmacoecon Open. 2025 Apr 29. doi: 10.1007/s41669-025-00580-4.

Using AI in the Economic Evaluation of AI-Based Health Technologies.

Pharmacoeconomics. 2025 Jun;43(6):597-600. doi: 10.1007/s40273-025-01496-x. Epub 2025 Apr 23.

Potential Meets Practicality: AI's Current Impact on the Evidence Generation and Synthesis Pipeline in Health Economics.

Clin Transl Sci. 2025 Apr;18(4):e70206. doi: 10.1111/cts.70206.

How much can we save by applying artificial intelligence in evidence synthesis? Results from a pragmatic review to quantify workload efficiencies and cost savings.

Front Pharmacol. 2025 Jan 31;16:1454245. doi: 10.3389/fphar.2025.1454245. eCollection 2025.

R WE ready for reimbursement? A round up of developments in real-world evidence relating to health technology assessment: part 17.

J Comp Eff Res. 2025 Jan;14(1):e240212. doi: 10.57264/cer-2024-0212. Epub 2024 Nov 27.

Generative Artificial Intelligence for Health Technology Assessment: Opportunities, Challenges, and Policy Considerations: An ISPOR Working Group Report.

Value Health. 2025 Feb;28(2):175-183. doi: 10.1016/j.jval.2024.10.3846. Epub 2024 Nov 12.

Automated Mass Extraction of Over 680,000 PICOs from Clinical Study Abstracts Using Generative AI: A Proof-of-Concept Study.

Pharmaceut Med. 2024 Sep;38(5):365-372. doi: 10.1007/s40290-024-00539-6. Epub 2024 Sep 26.

The transformative impact of large language models on medical writing and publishing: current applications, challenges and future directions.

Korean J Physiol Pharmacol. 2024 Sep 1;28(5):393-401. doi: 10.4196/kjpp.2024.28.5.393.

本文引用的文献

Health technology assessment for cancer medicines across the G7 countries and Oceania: an international, cross-sectional study.

Lancet Oncol. 2023 Jun;24(6):624-635. doi: 10.1016/S1470-2045(23)00175-4.

Guidance on the use of complex systems models for economic evaluations of public health interventions.

Health Econ. 2023 Jul;32(7):1603-1625. doi: 10.1002/hec.4681. Epub 2023 Apr 20.

Attention is not all you need: the complicated case of ethically using large language models in healthcare and medicine.

EBioMedicine. 2023 Apr;90:104512. doi: 10.1016/j.ebiom.2023.104512. Epub 2023 Mar 15.

Artificial Intelligence Applied to clinical trials: opportunities and challenges.

Health Technol (Berl). 2023;13(2):203-213. doi: 10.1007/s12553-023-00738-2. Epub 2023 Feb 28.

Cost-Effectiveness of Nivolumab Plus Ipilimumab for the First-Line Treatment of Intermediate/Poor-Risk Advanced and/or Metastatic Renal Cell Carcinoma in Switzerland.

Pharmacoecon Open. 2023 Jul;7(4):567-577. doi: 10.1007/s41669-023-00395-1. Epub 2023 Feb 9.

Machine Learning Methods in Health Economics and Outcomes Research-The PALISADE Checklist: A Good Practices Report of an ISPOR Task Force.

Value Health. 2022 Jul;25(7):1063-1080. doi: 10.1016/j.jval.2022.03.022.

Does health technology assessment compromise access to pharmaceuticals?

Eur J Health Econ. 2023 Apr;24(3):437-451. doi: 10.1007/s10198-022-01484-4. Epub 2022 Jun 16.

Reported Challenges in Health Technology Assessment of Complex Health Technologies.

Value Health. 2022 Jun;25(6):992-1001. doi: 10.1016/j.jval.2021.11.1356. Epub 2021 Dec 23.

Assessing the Consequences of External Reference Pricing for Global Access to Medicines and Innovation: Economic Analysis and Policy Implications.

Front Pharmacol. 2022 Apr 6;13:815029. doi: 10.3389/fphar.2022.815029. eCollection 2022.

Building an evidence standards framework for artificial intelligence-enabled digital health technologies.

Lancet Digit Health. 2022 Apr;4(4):e216-e217. doi: 10.1016/S2589-7500(22)00030-9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

人工智能实现健康经济建模自动化：评估大语言模型潜在应用的案例研究

Artificial Intelligence to Automate Health Economic Modelling: A Case Study to Evaluate the Potential Application of Large Language Models.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献