用于随机对照试验的GPT？利用人工智能确定对临床试验报告指南的遵循情况。

GPT for RCTs? Using AI to determine adherence to clinical trial reporting guidelines.

作者信息

Wrightson James G, Blazey Paul, Moher David, Khan Karim M, Ardern Clare L

机构信息

Department of Physical Therapy, The University of British Columbia Faculty of Medicine, Vancouver, British Columbia, Canada.

Centre for Aging SMART, The University of British Columbia, Vancouver, British Columbia, Canada.

出版信息

BMJ Open. 2025 Mar 18;15(3):e088735. doi: 10.1136/bmjopen-2024-088735.

DOI:10.1136/bmjopen-2024-088735

PMID:40107689

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11927406/

Abstract

OBJECTIVES

Adherence to established reporting guidelines can improve clinical trial reporting standards, but attempts to improve adherence have produced mixed results. This exploratory study aimed to determine how accurate a large language model generative artificial intelligence system (AI-LLM) was for determining reporting guideline compliance in a sample of sports medicine clinical trial reports.

DESIGN

This study was an exploratory retrospective data analysis. OpenAI GPT-4 and Meta Llama 2 AI-LLM were evaluated for their ability to determine reporting guideline adherence in a sample of sports medicine and exercise science clinical trial reports.

SETTING

Academic research institution.

PARTICIPANTS

The study sample included 113 published sports medicine and exercise science clinical trial papers. For each paper, the GPT-4 Turbo and Llama 2 70B models were prompted to answer a series of nine reporting guideline questions about the text of the article. The GPT-4 Vision model was prompted to answer two additional reporting guideline questions about the participant flow diagram in a subset of articles. The dataset was randomly split (80/20) into a TRAIN and TEST dataset. Hyperparameter and fine-tuning were performed using the TRAIN dataset. The Llama 2 model was fine-tuned using the data from the GPT-4 Turbo analysis of the TRAIN dataset.

PRIMARY AND SECONDARY OUTCOME MEASURES

The primary outcome was the F1-score, a measure of model performance on the TEST dataset. The secondary outcome was the model's classification accuracy (%).

RESULTS

Across all questions about the article text, the GPT-4 Turbo AI-LLM demonstrated acceptable performance (F1-score=0.89, accuracy (95% CI) = 90% (85% to 94%)). Accuracy for all reporting guidelines was >80%. The Llama 2 model accuracy was initially poor (F1-score=0.63, accuracy (95% CI) = 64% (57% to 71%)) and improved with fine-tuning (F1-score=0.84, accuracy (95% CI) = 83% (77% to 88%)). The GPT-4 Vision model accurately identified all participant flow diagrams (accuracy (95% CI) = 100% (89% to 100%)) but was less accurate at identifying when details were missing from the flow diagram (accuracy (95% CI) = 57% (39% to 73%)).

CONCLUSIONS

Both the GPT-4 and fine-tuned Llama 2 AI-LLMs showed promise as tools for assessing reporting guideline compliance. Next steps should include developing an efficient, open-source AI-LLM and exploring methods to improve model accuracy.

摘要

目的

遵循既定的报告指南可提高临床试验报告标准，但提高遵循度的尝试结果不一。这项探索性研究旨在确定大型语言模型生成式人工智能系统（AI-LLM）在确定运动医学临床试验报告样本中报告指南合规性方面的准确程度。

设计

本研究为探索性回顾性数据分析。对OpenAI GPT-4和Meta Llama 2 AI-LLM在确定运动医学和运动科学临床试验报告样本中报告指南遵循度的能力进行了评估。

设置

学术研究机构。

参与者

研究样本包括113篇已发表的运动医学和运动科学临床试验论文。对于每篇论文，提示GPT-4 Turbo和Llama 2 70B模型回答一系列关于文章文本的九个报告指南问题。提示GPT-4 Vision模型回答关于文章子集中参与者流程图的另外两个报告指南问题。数据集随机分为（80/20）训练集和测试集。使用训练集进行超参数调整和微调。Llama 2模型使用训练集的GPT-4 Turbo分析数据进行微调。

主要和次要结局指标

主要结局为F1分数，这是测试集上模型性能的一项指标。次要结局为模型的分类准确率（%）。

结果

在所有关于文章文本的问题中，GPT-4 Turbo AI-LLM表现出可接受的性能（F1分数=0.89，准确率（95%CI）=90%（85%至94%））。所有报告指南的准确率均>80%。Llama 2模型的准确率最初较差（F1分数=0.63，准确率（95%CI）=64%（57%至71%）），经微调后有所提高（F1分数=0.84，准确率（95%CI）=83%（77%至88%））。GPT-4 Vision模型准确识别了所有参与者流程图（准确率（95%CI）=100%（89%至100%）），但在识别流程图中何时缺少细节方面准确性较低（准确率（95%CI）=57%（39%至73%））。

结论

GPT-4和经过微调的Llama 2 AI-LLM都有望成为评估报告指南合规性的工具。下一步应包括开发一个高效的开源AI-LLM，并探索提高模型准确性的方法。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于随机对照试验的GPT？利用人工智能确定对临床试验报告指南的遵循情况。

GPT for RCTs? Using AI to determine adherence to clinical trial reporting guidelines.

作者信息

机构信息

出版信息

OBJECTIVES

DESIGN

SETTING

PARTICIPANTS

PRIMARY AND SECONDARY OUTCOME MEASURES

RESULTS

CONCLUSIONS

目的

设计

设置

参与者

主要和次要结局指标

结果

结论

相似文献

引用本文的文献

本文引用的文献

用于随机对照试验的GPT？利用人工智能确定对临床试验报告指南的遵循情况。

GPT for RCTs? Using AI to determine adherence to clinical trial reporting guidelines.

作者信息

机构信息

出版信息

OBJECTIVES

DESIGN

SETTING

PARTICIPANTS

PRIMARY AND SECONDARY OUTCOME MEASURES

RESULTS

CONCLUSIONS

目的

设计

设置

参与者

主要和次要结局指标

结果

结论

相似文献

引用本文的文献

本文引用的文献