• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ChatGPT-4o可作为系统评价中数据提取的第二评估者。

ChatGPT-4o can serve as the second rater for data extraction in systematic reviews.

作者信息

Motzfeldt Jensen Mette, Brix Danielsen Mathias, Riis Johannes, Assifuah Kristjansen Karoline, Andersen Stig, Okubo Yoshiro, Jørgensen Martin Grønbech

机构信息

Department of Geriatric Medicine, Aalborg University Hospital, Aalborg, Denmark.

Department of Clinical Medicine, Aalborg University, Aalborg, Denmark.

出版信息

PLoS One. 2025 Jan 7;20(1):e0313401. doi: 10.1371/journal.pone.0313401. eCollection 2025.

DOI:10.1371/journal.pone.0313401
PMID:39774443
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11706374/
Abstract

BACKGROUND

Systematic reviews provide clarity of a bulk of evidence and support the transfer of knowledge from clinical trials to guidelines. Yet, they are time-consuming. Artificial intelligence (AI), like ChatGPT-4o, may streamline processes of data extraction, but its efficacy requires validation.

OBJECTIVE

This study aims to (1) evaluate the validity of ChatGPT-4o for data extraction compared to human reviewers, and (2) test the reproducibility of ChatGPT-4o's data extraction.

METHODS

We conducted a comparative study using papers from an ongoing systematic review on exercise to reduce fall risk. Data extracted by ChatGPT-4o were compared to a reference standard: data extracted by two independent human reviewers. The validity was assessed by categorizing the extracted data into five categories ranging from completely correct to false data. Reproducibility was evaluated by comparing data extracted in two separate sessions using different ChatGPT-4o accounts.

RESULTS

ChatGPT-4o extracted a total of 484 data points across 11 papers. The AI's data extraction was 92.4% accurate (95% CI: 89.5% to 94.5%) and produced false data in 5.2% of cases (95% CI: 3.4% to 7.4%). The reproducibility between the two sessions was high, with an overall agreement of 94.1%. Reproducibility decreased when information was not reported in the papers, with an agreement of 77.2%.

CONCLUSION

Validity and reproducibility of ChatGPT-4o was high for data extraction for systematic reviews. ChatGPT-4o was qualified as a second reviewer for systematic reviews and showed potential for future advancements when summarizing data.

摘要

背景

系统评价提供了大量证据的清晰度,并支持知识从临床试验向指南的转化。然而,它们耗时较长。像ChatGPT-4o这样的人工智能可能会简化数据提取过程,但其有效性需要验证。

目的

本研究旨在(1)与人工审阅者相比,评估ChatGPT-4o进行数据提取的有效性,以及(2)测试ChatGPT-4o数据提取的可重复性。

方法

我们使用一项正在进行的关于运动以降低跌倒风险的系统评价中的论文进行了一项比较研究。将ChatGPT-4o提取的数据与参考标准进行比较:由两名独立的人工审阅者提取的数据。通过将提取的数据分为从完全正确到错误数据的五类来评估有效性。通过比较使用不同ChatGPT-4o账户在两个单独会话中提取的数据来评估可重复性。

结果

ChatGPT-4o在11篇论文中总共提取了484个数据点。人工智能的数据提取准确率为92.4%(95%CI:89.5%至94.5%),在5.2%的案例中产生了错误数据(95%CI:3.4%至7.4%)。两个会话之间的可重复性很高,总体一致性为94.1%。当论文中未报告信息时,可重复性降低,一致性为77.2%。

结论

ChatGPT-4o在系统评价数据提取方面的有效性和可重复性很高。ChatGPT-4o有资格作为系统评价的第二审阅者,并且在总结数据时显示出未来进步的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b52e/11706374/5a611de04e40/pone.0313401.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b52e/11706374/175f249485cb/pone.0313401.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b52e/11706374/29dbaf6e6c2d/pone.0313401.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b52e/11706374/ae3a38432975/pone.0313401.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b52e/11706374/5a611de04e40/pone.0313401.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b52e/11706374/175f249485cb/pone.0313401.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b52e/11706374/29dbaf6e6c2d/pone.0313401.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b52e/11706374/ae3a38432975/pone.0313401.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b52e/11706374/5a611de04e40/pone.0313401.g004.jpg

相似文献

1
ChatGPT-4o can serve as the second rater for data extraction in systematic reviews.ChatGPT-4o可作为系统评价中数据提取的第二评估者。
PLoS One. 2025 Jan 7;20(1):e0313401. doi: 10.1371/journal.pone.0313401. eCollection 2025.
2
Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.人工智能模型在风湿病委员会级问题中的比较性能:评估 Google Gemini 和 ChatGPT-4o。
Clin Rheumatol. 2024 Nov;43(11):3507-3513. doi: 10.1007/s10067-024-07154-5. Epub 2024 Sep 28.
3
Can ChatGPT-4o provide new systematic review ideas to oral and maxillofacial surgeons?ChatGPT-4 能否为口腔颌面外科医生提供新的系统评价思路?
J Stomatol Oral Maxillofac Surg. 2024 Oct;125(5S2):101979. doi: 10.1016/j.jormas.2024.101979. Epub 2024 Jul 26.
4
Comparing diagnostic skills in endodontic cases: dental students versus ChatGPT-4o.比较牙髓病病例中的诊断技能:牙科学生与ChatGPT-4o。
BMC Oral Health. 2025 Mar 29;25(1):457. doi: 10.1186/s12903-025-05857-y.
5
AI-powered standardised patients: evaluating ChatGPT-4o's impact on clinical case management in intern physicians.人工智能驱动的标准化病人:评估ChatGPT-4o对实习医生临床病例管理的影响。
BMC Med Educ. 2025 Feb 20;25(1):278. doi: 10.1186/s12909-025-06877-6.
6
ChatGPT-4o's Performance in Brain Tumor Diagnosis and MRI Findings: A Comparative Analysis with Radiologists.ChatGPT-4o在脑肿瘤诊断中的表现及MRI结果:与放射科医生的对比分析
Acad Radiol. 2025 Jun;32(6):3608-3617. doi: 10.1016/j.acra.2025.01.033. Epub 2025 Feb 8.
7
Performance of ChatGPT-4o in the diagnostic workup of fever among returning travellers requiring hospitalization: a validation study.ChatGPT-4o在需要住院治疗的归国旅行者发热诊断检查中的表现:一项验证研究。
J Travel Med. 2025 Apr 25;32(4). doi: 10.1093/jtm/taaf005.
8
ChatGPT as an effective tool for quality evaluation of radiomics research.ChatGPT作为一种用于影像组学研究质量评估的有效工具。
Eur Radiol. 2025 Apr;35(4):2030-2042. doi: 10.1007/s00330-024-11122-7. Epub 2024 Oct 15.
9
High identification and positive-negative discrimination but limited detailed grading accuracy of ChatGPT-4o in knee osteoarthritis radiographs.ChatGPT-4o在膝关节骨关节炎X光片方面具有较高的识别能力和正负鉴别能力,但详细分级准确性有限。
Knee Surg Sports Traumatol Arthrosc. 2025 May;33(5):1911-1919. doi: 10.1002/ksa.12639. Epub 2025 Mar 7.
10
Assessing the clinical support capabilities of ChatGPT 4o and ChatGPT 4o mini in managing lumbar disc herniation.评估ChatGPT 4o和ChatGPT 4o mini在管理腰椎间盘突出症方面的临床支持能力。
Eur J Med Res. 2025 Jan 22;30(1):45. doi: 10.1186/s40001-025-02296-x.

引用本文的文献

1
Critical Assessment of Large Language Models' (ChatGPT) Performance in Data Extraction for Systematic Reviews: Exploratory Study.大型语言模型(ChatGPT)在系统评价数据提取中的性能批判性评估:探索性研究
JMIR AI. 2025 Sep 11;4:e68097. doi: 10.2196/68097.

本文引用的文献

1
Sensitivity and Specificity of Using GPT-3.5 Turbo Models for Title and Abstract Screening in Systematic Reviews and Meta-analyses.使用 GPT-3.5 Turbo 模型进行系统评价和荟萃分析的标题和摘要筛选的灵敏度和特异性。
Ann Intern Med. 2024 Jun;177(6):791-799. doi: 10.7326/M23-3389. Epub 2024 May 21.
2
AlpaPICO: Extraction of PICO frames from clinical trial documents using LLMs.AlpaPICO:使用大语言模型从临床试验文档中提取 PICO 框架。
Methods. 2024 Jun;226:78-88. doi: 10.1016/j.ymeth.2024.04.005. Epub 2024 Apr 21.
3
ChatGPT revisited: Using ChatGPT-4 for finding references and editing language in medical scientific articles.
重新审视 ChatGPT:在医学科学文章中使用 ChatGPT-4 查找参考文献和编辑语言。
J Stomatol Oral Maxillofac Surg. 2024 Oct;125(5S2):101842. doi: 10.1016/j.jormas.2024.101842. Epub 2024 Mar 21.
4
Reporting Use of AI in Research and Scholarly Publication-JAMA Network Guidance.《研究与学术出版中人工智能的报告——美国医学会杂志网络指南》
JAMA. 2024 Apr 2;331(13):1096-1098. doi: 10.1001/jama.2024.3471.
5
Application ChatGPT in conducting systematic reviews and meta-analyses.ChatGPT在进行系统评价和荟萃分析中的应用。
Br Dent J. 2023 Jul;235(2):90-92. doi: 10.1038/s41415-023-6132-y.
6
Artificial intelligence in systematic reviews: promising when appropriately used.系统评价中的人工智能:恰当使用时前景广阔。
BMJ Open. 2023 Jul 7;13(7):e072254. doi: 10.1136/bmjopen-2023-072254.
7
The use of artificial intelligence for automating or semi-automating biomedical literature analyses: A scoping review.人工智能在自动化或半自动化生物医学文献分析中的应用:范围综述。
J Biomed Inform. 2023 Jun;142:104389. doi: 10.1016/j.jbi.2023.104389. Epub 2023 May 13.
8
Are ChatGPT and large language models "the answer" to bringing us closer to systematic review automation?ChatGPT 和大型语言模型是实现系统评价自动化的“答案”吗?
Syst Rev. 2023 Apr 29;12(1):72. doi: 10.1186/s13643-023-02243-z.
9
Automated medical literature screening using artificial intelligence: a systematic review and meta-analysis.使用人工智能进行医学文献自动筛选:系统评价和荟萃分析。
J Am Med Inform Assoc. 2022 Jul 12;29(8):1425-1432. doi: 10.1093/jamia/ocac066.
10
Using artificial intelligence methods for systematic review in health sciences: A systematic review.利用人工智能方法进行健康科学系统评价:系统评价。
Res Synth Methods. 2022 May;13(3):353-362. doi: 10.1002/jrsm.1553. Epub 2022 Feb 28.