• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

随机临床试验文章报告质量的大语言模型分析:一项系统评价

Large Language Model Analysis of Reporting Quality of Randomized Clinical Trial Articles: A Systematic Review.

作者信息

Srinivasan Apoorva, Berkowitz Jacob, Friedrich Nadine A, Kivelson Sophia, Tatonetti Nicholas P

机构信息

Department of Computational Biomedicine, Cedars Sinai Medical Center, Los Angeles, California.

Cedars Sinai Cancer, Cedars Sinai Medical Center, Los Angeles, California.

出版信息

JAMA Netw Open. 2025 Aug 1;8(8):e2529418. doi: 10.1001/jamanetworkopen.2025.29418.

DOI:10.1001/jamanetworkopen.2025.29418
PMID:40875232
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12395317/
Abstract

IMPORTANCE

Incomplete reporting in randomized clinical trials (RCTs) obscures bias and limits reproducibility. Manual audits for adherence to the Consolidated Standards of Reporting Trials (CONSORT) guideline cannot keep pace with publication volume.

OBJECTIVES

To build and validate a zero-shot large-language-model (LLM) pipeline for automated CONSORT assessment and to map reporting quality over time, biomedical disciplines, and trial features.

DESIGN, SETTING, AND PARTICIPANTS: This systematic review included RCTs that were indexed on PubMed, available in English, open access, human-participant research, and published between MONTH 1966 to MONTH 2024. PubMed PDFs were converted to XML and linked with Semantic Scholar and ClinicalTrials.gov metadata. Chat GPT-4o-mini was tested on the 50-article CONSORT-Text Classification Model (CONSORT-TM) benchmark, checked by experts in 70 randomly sampled RCTs, and then applied to the full sample.

EXPOSURE

Publication year, biomedical discipline, funding source, trial phase, US Food and Drug Administration regulation, and oversight features.

MAIN OUTCOMES AND MEASURES

The LLM judged whether each of 21 CONSORT items was met. Primary outcomes were (1) model performance vs expert review (precision, recall, and macro F1 score) and (2) proportion of items reported.

RESULTS

Of 53 137 screened PDFs, 21 041 RCTs (median [IQR] publication year, 2014 [2003-2020]; 30 disciplines) were included, with a registry-linked subset of 1790 RCTs that had a median (IQR) planned enrollment of 210 (95-440) participants. In the 70-article validation set (2210 decisions) LLM outputs matched experts 91.7% of the time (2026 of 2210 decision); the macro F1 score on CONSORT-TM was 0.86 (95% CI, 0.84-0.87). Mean CONSORT compliance increased from 27.3% (95% CI, 27.0%-27.6%) in 1966 to 1990 to 57.0% (95% CI, 56.8%-57.2%) in 2010 to 2024. However, reporting critical elements remained uncommon, such as allocation-concealment mechanism (16.1% [95% CI, 15.6%-16.6%]) and external-validity discussion (1.6% [95% CI, 1.5%-1.8%]). Compliance varied across disciplines from 35.2% (95% CI, 34.8%-35.6%) in pharmacology to 63.4% (95% CI, 62.1%-64.7%) in urology and showed only negligible associations with clinical trial characteristics (all Cramer V <0.10).

CONCLUSIONS AND RELEVANCE

In this systemic review of RCTs, a zero-shot LLM audited CONSORT adherence at scale, uncovering persistent reporting gaps and wide disciplinary variation across biomedical fields, underscoring the need for targeted editorial action to boost transparency and reproducibility.

摘要

重要性

随机临床试验(RCT)报告不完整会掩盖偏差并限制可重复性。人工审核随机对照试验报告标准(CONSORT)指南的遵守情况无法跟上出版物的数量。

目的

构建并验证一个用于自动CONSORT评估的零样本大语言模型(LLM)管道,并绘制随时间、生物医学学科和试验特征的报告质量图。

设计、设置和参与者:本系统评价纳入了在PubMed上索引、英文可用、开放获取、涉及人类参与者研究且于1966年1月至2024年1月期间发表的RCT。PubMed的PDF文件被转换为XML,并与Semantic Scholar和ClinicalTrials.gov元数据链接。Chat GPT - 4o - mini在50篇文章的CONSORT文本分类模型(CONSORT - TM)基准上进行测试,由70个随机抽样的RCT中的专家进行检查,然后应用于整个样本。

暴露因素

发表年份、生物医学学科、资金来源、试验阶段、美国食品药品监督管理局监管以及监督特征。

主要结局和测量指标

LLM判断21项CONSORT条款中的每一项是否符合。主要结局为(1)模型性能与专家评审的比较(精确率、召回率和宏F1分数)以及(2)报告条款的比例。

结果

在筛选的53137篇PDF中,纳入了21041项RCT(发表年份中位数[四分位间距],2014年[2003 - 2020年];30个学科),其中1790项RCT的注册链接子集的计划入组中位数(四分位间距)为210名(95 - 440名)参与者。在70篇文章的验证集(2210个决策)中,LLM输出与专家意见在91.7%的情况下相符(2210个决策中的2026个);CONSORT - TM上的宏F1分数为0.86(95%置信区间,0.84 - 0.87)。CONSORT合规性从1966年至1990年的27.3%(95%置信区间,27.0% - 27.6%)增加到2010年至2024年的57.0%(95%置信区间,56.8% - 57.2%)。然而,关键要素的报告仍然不常见,如分配隐藏机制(16.1%[95%置信区间,15.6% - 16.6%])和外部有效性讨论(1.6%[95%置信区间,1.5% - 1.8%])。各学科的合规性有所不同,从药理学的35.2%(95%置信区间,34.8% - 35.6%)到泌尿外科的63.4%(95%置信区间,62.1% - 64.7%),并且与临床试验特征的关联仅可忽略不计(所有克莱姆V值<0.10)。

结论和相关性

在这项对RCT的系统评价中,一个零样本LLM大规模审核了CONSORT的遵守情况,揭示了持续存在的报告差距以及生物医学领域广泛的学科差异,强调了需要有针对性的编辑行动来提高透明度和可重复性。

相似文献

1
Large Language Model Analysis of Reporting Quality of Randomized Clinical Trial Articles: A Systematic Review.随机临床试验文章报告质量的大语言模型分析:一项系统评价
JAMA Netw Open. 2025 Aug 1;8(8):e2529418. doi: 10.1001/jamanetworkopen.2025.29418.
2
Consolidated standards of reporting trials (CONSORT) and the completeness of reporting of randomised controlled trials (RCTs) published in medical journals.试验报告的统一标准(CONSORT)以及医学期刊上发表的随机对照试验(RCT)的报告完整性。
Cochrane Database Syst Rev. 2012 Nov 14;11(11):MR000030. doi: 10.1002/14651858.MR000030.pub2.
3
The reporting quality of randomised controlled trials in surgery: a systematic review.外科随机对照试验的报告质量:一项系统评价。
Int J Surg. 2007 Dec;5(6):413-22. doi: 10.1016/j.ijsu.2007.06.002. Epub 2007 Oct 29.
4
Do peer reviewers comment on reporting items as instructed by the journal? A secondary analysis of two randomized trials.同行评审员是否按照期刊的要求对报告项目进行评论?两项随机试验的二次分析。
J Clin Epidemiol. 2025 May 8;183:111818. doi: 10.1016/j.jclinepi.2025.111818.
5
Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials.与随机试验中评估的医疗保健结果相比,观察性研究设计评估的医疗保健结果。
Cochrane Database Syst Rev. 2014 Apr 29;2014(4):MR000034. doi: 10.1002/14651858.MR000034.pub2.
6
Fornix-based versus limbal-based conjunctival trabeculectomy flaps for glaucoma.用于青光眼的穹窿部结膜小梁切除术瓣与角膜缘部结膜小梁切除术瓣对比
Cochrane Database Syst Rev. 2015 Nov 25;11(11):CD009380. doi: 10.1002/14651858.CD009380.pub2.
7
Reporting quality and evidence support in randomized controlled trials of herbal medicine formulas for vestibular migraine.用于前庭性偏头痛的中药复方随机对照试验的报告质量与证据支持
Phytomedicine. 2025 Jul 25;143:156864. doi: 10.1016/j.phymed.2025.156864. Epub 2025 May 16.
8
COMPLIANCE OF RANDOMIZED CLINICAL TRIALS ON DENTAL CARIES PREVENTION METHODS WITH THE CONSORT STATEMENT: A SYSTEMATIC REVIEW.随机临床试验对牙科龋齿预防方法与 CONSORT 声明的一致性:系统评价。
J Evid Based Dent Pract. 2021 Jun;21(2):101542. doi: 10.1016/j.jebdp.2021.101542. Epub 2021 Mar 4.
9
Laser-assisted subepithelial keratectomy (LASEK) versus photorefractive keratectomy (PRK) for correction of myopia.准分子激光上皮下角膜磨镶术(LASEK)与准分子激光角膜切削术(PRK)矫正近视的对比
Cochrane Database Syst Rev. 2016 Feb 22;2(2):CD009799. doi: 10.1002/14651858.CD009799.pub2.
10
Treatments for seizures in catamenial (menstrual-related) epilepsy.月经性(与月经相关)癫痫发作的治疗。
Cochrane Database Syst Rev. 2021 Sep 16;9(9):CD013225. doi: 10.1002/14651858.CD013225.pub3.

本文引用的文献

1
GPT for RCTs? Using AI to determine adherence to clinical trial reporting guidelines.用于随机对照试验的GPT?利用人工智能确定对临床试验报告指南的遵循情况。
BMJ Open. 2025 Mar 18;15(3):e088735. doi: 10.1136/bmjopen-2024-088735.
2
Text classification models for assessing the completeness of randomized controlled trial publications based on CONSORT reporting guidelines.基于 CONSORT 报告规范的评估随机对照试验出版物完整性的文本分类模型。
Sci Rep. 2024 Sep 17;14(1):21721. doi: 10.1038/s41598-024-72130-7.
3
Methodology reporting improved over time in 176,469 randomized controlled trials.
方法学报告在 176469 项随机对照试验中随着时间的推移而改善。
J Clin Epidemiol. 2023 Oct;162:19-28. doi: 10.1016/j.jclinepi.2023.08.004. Epub 2023 Aug 9.
4
Indicators of questionable research practices were identified in 163,129 randomized controlled trials.在163129项随机对照试验中发现了可疑研究行为的指标。
J Clin Epidemiol. 2023 Feb;154:23-32. doi: 10.1016/j.jclinepi.2022.11.020. Epub 2022 Dec 2.
5
Evaluation of reporting quality of randomized controlled trials in patients with COVID-19 using the CONSORT statement.评价使用 CONSORT 声明报告 COVID-19 患者随机对照试验报告质量。
PLoS One. 2021 Sep 23;16(9):e0257093. doi: 10.1371/journal.pone.0257093. eCollection 2021.
6
The methodological quality of 176,620 randomized controlled trials published between 1966 and 2018 reveals a positive trend but also an urgent need for improvement.1966 年至 2018 年间发表的 176620 项随机对照试验的方法学质量显示出一种积极的趋势,但也迫切需要改进。
PLoS Biol. 2021 Apr 19;19(4):e3001162. doi: 10.1371/journal.pbio.3001162. eCollection 2021 Apr.
7
Toward assessing clinical trial publications for reporting transparency.迈向评估临床试验出版物报告的透明度。
J Biomed Inform. 2021 Apr;116:103717. doi: 10.1016/j.jbi.2021.103717. Epub 2021 Feb 26.
8
Development and Validation of a Natural Language Processing Tool to Generate the CONSORT Reporting Checklist for Randomized Clinical Trials.开发和验证一种自然语言处理工具,用于生成随机临床试验的 CONSORT 报告清单。
JAMA Netw Open. 2020 Oct 1;3(10):e2014661. doi: 10.1001/jamanetworkopen.2020.14661.
9
Trialstreamer: A living, automatically updated database of clinical trial reports.Trialstreamer:一个实时更新的临床试验报告数据库。
J Am Med Inform Assoc. 2020 Dec 9;27(12):1903-1912. doi: 10.1093/jamia/ocaa163.
10
Advancing PICO element detection in biomedical text via deep neural networks.通过深度神经网络提高生物医学文本中的 PICO 元素检测。
Bioinformatics. 2020 Jun 1;36(12):3856-3862. doi: 10.1093/bioinformatics/btaa256.