• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Concordance between humans and GPT-4 in appraising the methodological quality of case reports and case series using the Murad tool.人类与 GPT-4 评估病例报告和病例系列研究方法学质量的一致性:使用 Murad 工具。
BMC Med Res Methodol. 2024 Nov 4;24(1):266. doi: 10.1186/s12874-024-02372-6.
2
Benchmarking Human-AI collaboration for common evidence appraisal tools.针对常见证据评估工具的人机协作基准测试。
J Clin Epidemiol. 2024 Nov;175:111533. doi: 10.1016/j.jclinepi.2024.111533. Epub 2024 Sep 12.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
The methodological quality assessment tools for preclinical and clinical studies, systematic review and meta-analysis, and clinical practice guideline: a systematic review.临床前和临床研究、系统评价与荟萃分析以及临床实践指南的方法学质量评估工具:一项系统评价。
J Evid Based Med. 2015 Feb;8(1):2-10. doi: 10.1111/jebm.12141.
5
Implementation and evaluation of an additional GPT-4-based reviewer in PRISMA-based medical systematic literature reviews.基于 PRISMA 的医学系统文献综述中额外的基于 GPT-4 的审查员的实施和评估。
Int J Med Inform. 2024 Sep;189:105531. doi: 10.1016/j.ijmedinf.2024.105531. Epub 2024 Jun 26.
6
Integrating large language models in systematic reviews: a framework and case study using ROBINS-I for risk of bias assessment.将大型语言模型集成到系统评价中:使用 ROBINS-I 进行偏倚风险评估的框架和案例研究。
BMJ Evid Based Med. 2024 Nov 22;29(6):394-398. doi: 10.1136/bmjebm-2023-112597.
7
Analysis of risk of bias assessments in a sample of intervention systematic reviews, Part II: focus on risk of bias tools reveals few meet current appraisal standards.纳入研究的干预系统评价偏倚风险评估分析,第二部分:关注偏倚风险工具,结果显示仅有少数工具符合当前评价标准。
J Clin Epidemiol. 2024 Oct;174:111460. doi: 10.1016/j.jclinepi.2024.111460. Epub 2024 Jul 16.
8
Cochrane risk of bias tool was used inadequately in the majority of non-Cochrane systematic reviews.Cochrane 偏倚风险工具在大多数非 Cochrane 系统评价中使用不当。
J Clin Epidemiol. 2020 Jul;123:114-119. doi: 10.1016/j.jclinepi.2020.03.019. Epub 2020 Apr 1.
9
Do authors of systematic reviews of epidemiological observational studies assess the methodologies of the included primary studies? An empirical examination of methodological tool use in the literature.系统评价流行病学观察性研究的作者是否评估纳入的原始研究方法?文献中方法学工具使用的实证检验。
BMC Med Res Methodol. 2024 Oct 8;24(1):233. doi: 10.1186/s12874-024-02349-5.
10
Methodological quality of case series studies: an introduction to the JBI critical appraisal tool.病例系列研究的方法学质量:JBI 批判性评价工具介绍。
JBI Evid Synth. 2020 Oct;18(10):2127-2133. doi: 10.11124/JBISRIR-D-19-00099.

本文引用的文献

1
Integrating large language models in systematic reviews: a framework and case study using ROBINS-I for risk of bias assessment.将大型语言模型集成到系统评价中:使用 ROBINS-I 进行偏倚风险评估的框架和案例研究。
BMJ Evid Based Med. 2024 Nov 22;29(6):394-398. doi: 10.1136/bmjebm-2023-112597.
2
Repeatability, reproducibility, and diagnostic accuracy of a commercial large language model (ChatGPT) to perform emergency department triage using the Canadian triage and acuity scale.商用大型语言模型(ChatGPT)使用加拿大分诊和 acuity 量表进行急诊科分诊的可重复性、可再现性和诊断准确性。
CJEM. 2024 Jan;26(1):40-46. doi: 10.1007/s43678-023-00616-w. Epub 2024 Jan 11.
3
Case study research and causal inference.案例研究与因果推断。
BMC Med Res Methodol. 2022 Dec 1;22(1):307. doi: 10.1186/s12874-022-01790-8.
4
Automated medical literature screening using artificial intelligence: a systematic review and meta-analysis.使用人工智能进行医学文献自动筛选:系统评价和荟萃分析。
J Am Med Inform Assoc. 2022 Jul 12;29(8):1425-1432. doi: 10.1093/jamia/ocac066.
5
Automation of literature screening using machine learning in medical evidence synthesis: a diagnostic test accuracy systematic review protocol.使用机器学习进行医学证据综合文献筛选的自动化:诊断测试准确性系统评价方案。
Syst Rev. 2022 Jan 15;11(1):11. doi: 10.1186/s13643-021-01881-5.
6
Case study research for better evaluations of complex interventions: rationale and challenges.案例研究在复杂干预措施评估中的应用:基本原理与挑战。
BMC Med. 2020 Nov 10;18(1):301. doi: 10.1186/s12916-020-01777-6.
7
Kappa and Beyond: Is There Agreement?卡帕值及其他:是否存在一致性?
Global Spine J. 2020 Jun;10(4):499-501. doi: 10.1177/2192568220911648. Epub 2020 Mar 3.
8
Comparing machine and human reviewers to evaluate the risk of bias in randomized controlled trials.比较机器和人工评审员评估随机对照试验偏倚风险。
Res Synth Methods. 2020 May;11(3):484-493. doi: 10.1002/jrsm.1398. Epub 2020 Mar 3.
9
Protect us from poor-quality medical research.保护我们免受低质量的医学研究的影响。
Hum Reprod. 2018 May 1;33(5):770-776. doi: 10.1093/humrep/dey056.
10
Methodological quality and synthesis of case series and case reports.病例系列和病例报告的方法学质量与综合分析
BMJ Evid Based Med. 2018 Apr;23(2):60-63. doi: 10.1136/bmjebm-2017-110853. Epub 2018 Feb 2.

人类与 GPT-4 评估病例报告和病例系列研究方法学质量的一致性:使用 Murad 工具。

Concordance between humans and GPT-4 in appraising the methodological quality of case reports and case series using the Murad tool.

机构信息

Evidence-based Practice Center, Kern Center for the Science of Healthcare Delivery, Mayo Clinic, 200 1st Street SW, Rochester, MN, 55905, USA.

Division of Public Health, Infectious Diseases and Occupational Medicine, Mayo Clinic, Rochester, MN, USA.

出版信息

BMC Med Res Methodol. 2024 Nov 4;24(1):266. doi: 10.1186/s12874-024-02372-6.

DOI:10.1186/s12874-024-02372-6
PMID:39497032
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11533388/
Abstract

BACKGROUND

Assessing the methodological quality of case reports and case series is challenging due to human judgment variability and time constraints. We evaluated the agreement in judgments between human reviewers and GPT-4 when applying a standard methodological quality assessment tool designed for case reports and series.

METHODS

We searched Scopus for systematic reviews published in 2023-2024 that cited the appraisal tool by Murad et al. A GPT-4 based agent was developed to assess the methodological quality using the 8 signaling questions of the tool. Observed agreement and agreement coefficient were estimated comparing published judgments of human reviewers to GPT-4 assessment.

RESULTS

We included 797 case reports and series. The observed agreement ranged between 41.91% and 80.93% across the eight questions (agreement coefficient ranged from 25.39 to 79.72%). The lowest agreement was noted in the first signaling question about selection of cases. The agreement was similar in articles published in journals with impact factor < 5 vs. ≥ 5, and when excluding systematic reviews that did not use 3 causality questions. Repeating the analysis using the same prompts demonstrated high agreement between the two GPT-4 attempts except for the first question about selection of cases.

CONCLUSIONS

The study demonstrates a moderate agreement between GPT-4 and human reviewers in assessing the methodological quality of case series and reports using the Murad tool. The current performance of GPT-4 seems promising but unlikely to be sufficient for the rigor of a systematic review and pairing the model with a human reviewer is required.

摘要

背景

由于人为判断的变异性和时间限制,评估病例报告和病例系列的方法学质量具有挑战性。我们评估了在应用专为病例报告和系列设计的标准方法学质量评估工具时,人类评审员和 GPT-4 之间判断的一致性。

方法

我们在 Scopus 中搜索了 2023-2024 年发表的引用了 Murad 等人评估工具的系统评价。开发了一个基于 GPT-4 的代理,使用该工具的 8 个信号问题来评估方法学质量。通过比较人类评审员的发表判断和 GPT-4 的评估,估计了观察到的一致性和一致性系数。

结果

我们纳入了 797 篇病例报告和系列。八个问题的观察一致性在 41.91%到 80.93%之间(一致性系数在 25.39 到 79.72%之间)。在关于病例选择的第一个信号问题中,一致性最低。在影响因子<5 和≥5 的期刊上发表的文章以及在排除未使用 3 个因果关系问题的系统评价时,一致性相似。使用相同提示重复分析表明,除了关于病例选择的第一个问题外,GPT-4 之间的两次尝试之间存在高度一致性。

结论

该研究表明,在使用 Murad 工具评估病例系列和报告的方法学质量时,GPT-4 与人类评审员之间存在中等程度的一致性。GPT-4 当前的性能似乎很有希望,但不太可能足以满足系统评价的严格性,需要将模型与人类评审员配对。