利用大语言模型和荟萃分析提高肺癌质子治疗后并发症预测能力。

Improving Prediction of Complications Post-Proton Therapy in Lung Cancer Using Large Language Models and Meta-Analysis.

机构信息

Medical Physics and Informatics Laboratory of Electronics Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan.

Department of Radiation Oncology, Kaohsiung Chang Gung Memorial Hospital and Chang Gung University College of Medicine, Kaohsiung, Taiwan.

出版信息

Cancer Control. 2024 Jan-Dec;31:10732748241286749. doi: 10.1177/10732748241286749.

DOI:10.1177/10732748241286749

PMID:39307562

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11418344/

Abstract

PURPOSE

This study enhances the efficiency of predicting complications in lung cancer patients receiving proton therapy by utilizing large language models (LLMs) and meta-analytical techniques for literature quality assessment.

MATERIALS AND METHODS

We integrated systematic reviews with LLM evaluations, sourcing studies from Web of Science, PubMed, and Scopus, managed via EndNote X20. Inclusion and exclusion criteria ensured literature relevance. Techniques included meta-analysis, heterogeneity assessment using Cochran's Q test and I statistics, and subgroup analyses for different complications. Quality and bias risk were assessed using the PROBAST tool and further analyzed with models such as ChatGPT-4, Llama2-13b, and Llama3-8b. Evaluation metrics included AUC, accuracy, precision, recall, F1 score, and time efficiency (WPM).

RESULTS

The meta-analysis revealed an overall effect size of 0.78 for model predictions, with high heterogeneity observed (I = 72.88%, < 0.001). Subgroup analysis for radiation-induced esophagitis and pneumonitis revealed predictive effect sizes of 0.79 and 0.77, respectively, with a heterogeneity index (I) of 0%, indicating that there were no significant differences among the models in predicting these specific complications. A literature assessment using LLMs demonstrated that ChatGPT-4 achieved the highest accuracy at 90%, significantly outperforming the Llama3 and Llama2 models, which had accuracies ranging from 44% to 62%. Additionally, LLM evaluations were conducted 3229 times faster than manual assessments were, markedly enhancing both efficiency and accuracy. The risk assessment results identified nine studies as high risk, three as low risk, and one as unknown, confirming the robustness of the ChatGPT-4 across various evaluation metrics.

CONCLUSION

This study demonstrated that the integration of large language models with meta-analysis techniques can significantly increase the efficiency of literature evaluations and reduce the time required for assessments, confirming that there are no significant differences among models in predicting post proton therapy complications in lung cancer patients.

摘要

目的

本研究通过利用大语言模型（LLM）和文献质量评估的荟萃分析技术，提高预测接受质子治疗的肺癌患者并发症的效率。

材料与方法

我们整合了系统评价和 LLM 评估，从 Web of Science、PubMed 和 Scopus 中获取研究，通过 EndNote X20 进行管理。纳入和排除标准确保了文献的相关性。方法包括荟萃分析、使用 Cochran's Q 检验和 I 统计量评估异质性，以及针对不同并发症的亚组分析。使用 PROBAST 工具评估质量和偏倚风险，并进一步使用 ChatGPT-4、Llama2-13b 和 Llama3-8b 等模型进行分析。评估指标包括 AUC、准确性、精度、召回率、F1 得分和时间效率（WPM）。

结果

荟萃分析显示，模型预测的总体效应大小为 0.78，观察到高度异质性（I = 72.88%， < 0.001）。对放射性食管炎和放射性肺炎的亚组分析显示，预测效应大小分别为 0.79 和 0.77，异质性指数（I）为 0%，表明在预测这些特定并发症方面，模型之间没有显著差异。使用 LLM 对文献进行评估表明，ChatGPT-4 的准确率最高，为 90%，明显优于 Llama3 和 Llama2 模型，其准确率范围为 44%至 62%。此外，LLM 评估的速度比手动评估快 3229 倍，显著提高了效率和准确性。风险评估结果确定了 9 项研究为高风险，3 项为低风险，1 项为未知，证实了 ChatGPT-4 在各种评估指标下的稳健性。