• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估定制版ChatGPT在健康研究系统评价数据提取中的应用:开发与可用性研究

Evaluating a Customized Version of ChatGPT for Systematic Review Data Extraction in Health Research: Development and Usability Study.

作者信息

Sercombe Jayden, Bryant Zachary, Wilson Jack

机构信息

The Matilda Centre for Research in Mental Health and Substance Use, University of Sydney, Jane Foss Russell Building (G02), Level 6, Sydney, 2006, Australia, 612 8627 9380.

出版信息

JMIR Form Res. 2025 Aug 11;9:e68666. doi: 10.2196/68666.

DOI:10.2196/68666
PMID:40789147
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12338963/
Abstract

BACKGROUND

Systematic reviews are essential for synthesizing research in health sciences; however, they are resource-intensive and prone to human error. The data extraction phase, in which key details of studies are identified and recorded in a systematic manner, may benefit from the application of automation processes. Recent advancements in artificial intelligence, specifically in large language models (LLMs) such as ChatGPT, may streamline this process.

OBJECTIVE

This study aimed to develop and evaluate a custom Generative Pre-Training Transformer (GPT), named Systematic Review Extractor Pro, for automating the data extraction phase of systematic reviews in health research.

METHODS

OpenAI's GPT Builder was used to create a GPT tailored to extract information from academic manuscripts. The Role, Instruction, Steps, End goal, and Narrowing (RISEN) framework was used to inform prompt engineering for the GPT. A sample of 20 studies from two distinct systematic reviews was used to evaluate the GPT's performance in extraction. Agreement rates between the GPT outputs and human reviewers were calculated for each study subsection.

RESULTS

The mean time for human data extraction was 36 minutes per study, compared to 26.6 seconds for GPT generation, followed by 13 minutes of human review. The GPT demonstrated high overall agreement rates with human reviewers, achieving 91.45% for review 1 and 89.31% for review 2. It was particularly accurate in extracting study characteristics (review 1: 95.25%; review 2: 90.83%) and participant characteristics (review 1: 95.03%; review 2: 90.00%), with lower performance observed in more complex areas such as methodological characteristics (87.07%) and statistical results (77.50%). The GPT correctly extracted data in 14 instances (3.25% in review 1) and four instances (1.16% in review 2) when the human reviewer was incorrect.

CONCLUSIONS

The custom GPT significantly reduced extraction time and shows evidence that it can extract data with high accuracy, particularly for participant and study characteristics. This tool may offer a viable option for researchers seeking to reduce resource demands during the extraction phase, although more research is needed to evaluate test-retest reliability, performance across broader review types, and accuracy in extracting statistical data. The tool developed in the current study has been made open access.

摘要

背景

系统评价对于综合健康科学研究至关重要;然而,它们资源密集且容易出现人为错误。数据提取阶段,即研究的关键细节以系统方式被识别和记录的阶段,可能会从自动化流程的应用中受益。人工智能的最新进展,特别是像ChatGPT这样的大语言模型(LLMs),可能会简化这个过程。

目的

本研究旨在开发并评估一个名为系统评价提取专业版的定制生成式预训练变换器(GPT),用于自动化健康研究中系统评价的数据提取阶段。

方法

使用OpenAI的GPT构建器创建一个用于从学术手稿中提取信息的GPT。角色、指令、步骤、最终目标和细化(RISEN)框架用于指导GPT的提示工程。从两项不同的系统评价中选取20项研究的样本,用于评估GPT在提取方面的性能。计算GPT输出与人工评审员之间在每个研究子部分的一致率。

结果

人工数据提取的平均时间为每项研究36分钟,而GPT生成的时间为26.6秒,随后人工评审需要13分钟。GPT与人工评审员的总体一致率较高,在第一次评价中达到91.45%,在第二次评价中达到89.31%。它在提取研究特征(第一次评价:95.25%;第二次评价:90.83%)和参与者特征(第一次评价:95.03%;第二次评价:90.00%)方面特别准确,在方法学特征(87.07%)和统计结果(77.50%)等更复杂的领域表现较低。当人工评审员错误时,GPT在14个实例(第一次评价中为3.25%)和4个实例(第二次评价中为1.16%)中正确提取了数据。

结论

定制的GPT显著减少了提取时间,并表明它能够高精度地提取数据,特别是对于参与者和研究特征。尽管需要更多研究来评估重测信度、在更广泛的评价类型中的性能以及提取统计数据的准确性,但该工具可能为寻求在提取阶段减少资源需求的研究人员提供一个可行选择。本研究中开发的工具已开放获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb34/12338963/a6a635469065/formative-v9-e68666-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb34/12338963/a6a635469065/formative-v9-e68666-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb34/12338963/a6a635469065/formative-v9-e68666-g001.jpg

相似文献

1
Evaluating a Customized Version of ChatGPT for Systematic Review Data Extraction in Health Research: Development and Usability Study.评估定制版ChatGPT在健康研究系统评价数据提取中的应用:开发与可用性研究
JMIR Form Res. 2025 Aug 11;9:e68666. doi: 10.2196/68666.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Use of ChatGPT Large Language Models to Extract Details of Recommendations for Additional Imaging From Free-Text Impressions of Radiology Reports.使用ChatGPT大型语言模型从放射学报告的自由文本印象中提取额外影像学检查建议的详细信息。
AJR Am J Roentgenol. 2025 Apr;224(4):e2432341. doi: 10.2214/AJR.24.32341. Epub 2025 Jan 29.
4
Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study.使用大型语言模型对临床综述进行自动化论文筛选:数据分析研究。
J Med Internet Res. 2024 Jan 12;26:e48996. doi: 10.2196/48996.
5
[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].[容量与健康结果:来自系统评价和意大利医院数据评估的证据]
Epidemiol Prev. 2013 Mar-Jun;37(2-3 Suppl 2):1-100.
6
Human-Comparable Sensitivity of Large Language Models in Identifying Eligible Studies Through Title and Abstract Screening: 3-Layer Strategy Using GPT-3.5 and GPT-4 for Systematic Reviews.大型语言模型在通过标题和摘要筛选确定合格研究方面的人类可比敏感性:使用 GPT-3.5 和 GPT-4 进行系统评价的 3 层策略。
J Med Internet Res. 2024 Aug 16;26:e52758. doi: 10.2196/52758.
7
Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study.在医学视觉问答中评估Bard Gemini Pro和GPT-4 Vision对学生表现的影响:比较案例研究
JMIR Form Res. 2024 Dec 17;8:e57592. doi: 10.2196/57592.
8
Eliciting adverse effects data from participants in clinical trials.从临床试验参与者中获取不良反应数据。
Cochrane Database Syst Rev. 2018 Jan 16;1(1):MR000039. doi: 10.1002/14651858.MR000039.pub2.
9
Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.ChatGPT与互联网搜索用于职业医学临床研究和决策的比较:随机对照试验
JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.
10
Intravenous magnesium sulphate and sotalol for prevention of atrial fibrillation after coronary artery bypass surgery: a systematic review and economic evaluation.静脉注射硫酸镁和索他洛尔预防冠状动脉搭桥术后房颤:系统评价与经济学评估
Health Technol Assess. 2008 Jun;12(28):iii-iv, ix-95. doi: 10.3310/hta12280.

本文引用的文献

1
Evaluating the OpenAI's GPT-3.5 Turbo's performance in extracting information from scientific articles on diabetic retinopathy.评估 OpenAI 的 GPT-3.5 Turbo 在从关于糖尿病视网膜病变的科学文章中提取信息的性能。
Syst Rev. 2024 May 16;13(1):135. doi: 10.1186/s13643-024-02523-2.
2
Can large language models replace humans in systematic reviews? Evaluating GPT-4's efficacy in screening and extracting data from peer-reviewed and grey literature in multiple languages.大型语言模型能否在系统评价中取代人类?评估 GPT-4 从多种语言的同行评议文献和灰色文献中进行筛选和提取数据的效果。
Res Synth Methods. 2024 Jul;15(4):616-626. doi: 10.1002/jrsm.1715. Epub 2024 Mar 14.
3
Evaluation of a prototype machine learning tool to semi-automate data extraction for systematic literature reviews.
评价一个机器学习原型工具,以半自动提取系统文献综述的数据。
Syst Rev. 2023 Oct 6;12(1):187. doi: 10.1186/s13643-023-02351-w.
4
Application ChatGPT in conducting systematic reviews and meta-analyses.ChatGPT在进行系统评价和荟萃分析中的应用。
Br Dent J. 2023 Jul;235(2):90-92. doi: 10.1038/s41415-023-6132-y.
5
Artificial intelligence in systematic reviews: promising when appropriately used.系统评价中的人工智能:恰当使用时前景广阔。
BMJ Open. 2023 Jul 7;13(7):e072254. doi: 10.1136/bmjopen-2023-072254.
6
AI in health and medicine.人工智能在医疗中的应用。
Nat Med. 2022 Jan;28(1):31-38. doi: 10.1038/s41591-021-01614-0. Epub 2022 Jan 20.
7
Tools to support the automation of systematic reviews: a scoping review.支持系统评价自动化的工具:范围综述。
J Clin Epidemiol. 2022 Apr;144:22-42. doi: 10.1016/j.jclinepi.2021.12.005. Epub 2021 Dec 8.
8
Nearly 80 systematic reviews were published each day: Observational study on trends in epidemiology and reporting over the years 2000-2019.每天发表近 80 篇系统评价:2000 年至 2019 年流行病学趋势和报告的观察性研究。
J Clin Epidemiol. 2021 Oct;138:1-11. doi: 10.1016/j.jclinepi.2021.05.022. Epub 2021 Jun 4.
9
The significant cost of systematic reviews and meta-analyses: A call for greater involvement of machine learning to assess the promise of clinical trials.系统评价和荟萃分析的高昂成本:呼吁机器学习更多地参与评估临床试验的前景。
Contemp Clin Trials Commun. 2019 Aug 25;16:100443. doi: 10.1016/j.conctc.2019.100443. eCollection 2019 Dec.
10
Systematic reviews and meta-analyses in the health sciences: Best practice methods for research syntheses.健康科学中的系统评价和荟萃分析:研究综合的最佳实践方法。
Soc Sci Med. 2019 Jul;233:237-251. doi: 10.1016/j.socscimed.2019.05.035. Epub 2019 May 28.