文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

ChatGPT's performance in sample size estimation: a preliminary study on the capabilities of artificial intelligence.

作者信息

Sebo Paul, Wang Ting

机构信息

University Institute for Primary Care (IuMFE), University of Geneva, 1211 Geneva, Switzerland.

School of Library and Information Management, Emporia State University, Emporia, KS 66801, United States.

出版信息

Fam Pract. 2025 Aug 14;42(5). doi: 10.1093/fampra/cmaf069.


DOI:10.1093/fampra/cmaf069
PMID:40910515
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12411907/
Abstract

BACKGROUND: Artificial intelligence tools, including large language models such as ChatGPT, are increasingly integrated into clinical and primary care research. However, their ability to assist with specialized statistical tasks, such as sample size estimation, remains largely unexplored. METHODS: We evaluated the accuracy and reproducibility of ChatGPT-4.0 and ChatGPT-4o in estimating sample sizes across 24 standard statistical scenarios. Examples were selected from a statistical textbook and an educational website, covering basic methods such as estimating means, proportions, and correlations. Each example was tested twice per model. Models were accessed through the ChatGPT web interface, with a new independent chat session initiated for each round. Accuracy was assessed using mean and median absolute percentage error compared with validated reference values. Reproducibility was assessed using symmetric mean and median absolute percentage error between rounds. Comparisons were performed using Wilcoxon signed-rank tests. RESULTS: For ChatGPT-4.0 and ChatGPT-4o, absolute percentage errors ranged from 0% to 15.2% (except one case: 26.3%) and 0% to 14.3%, respectively, with most examples showing errors below 5%. ChatGPT-4o showed better accuracy than ChatGPT-4.0 (mean absolute percentage error: 3.1% vs. 4.1% in round#1, P-value = .01; 2.8% vs. 5.1% in round#2, P-value =.02) and lower symmetric mean absolute percentage error (0.8% vs. 2.5%), though not significant (P-value = .18). CONCLUSIONS: ChatGPT-4.0 and ChatGPT-4o provided reasonably accurate sample size estimates across standard scenarios, with good reproducibility. However, inconsistencies were observed, underscoring the need for cautious interpretation and expert validation. Further research should assess performance in more complex contexts and across a broader range of AI models.

摘要

相似文献

[1]
ChatGPT's performance in sample size estimation: a preliminary study on the capabilities of artificial intelligence.

Fam Pract. 2025-8-14

[2]
Evaluating ChatGPT's Utility in Biologic Therapy for Systemic Lupus Erythematosus: Comparative Study of ChatGPT and Google Web Search.

JMIR Form Res. 2025-8-28

[3]
Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.

JMIR Form Res. 2025-5-20

[4]
Using Artificial Intelligence ChatGPT to Access Medical Information About Chemical Eye Injuries: Comparative Study.

JMIR Form Res. 2025-8-13

[5]
The performance of ChatGPT on medical image-based assessments and implications for medical education.

BMC Med Educ. 2025-8-23

[6]
Artificial Intelligence Chatbots in Pediatric Emergencies: A Reliable Lifeline or a Risk?

Cureus. 2025-8-1

[7]
AI in Medical Questionnaires: Innovations, Diagnosis, and Implications.

J Med Internet Res. 2025-6-23

[8]
Assessing ChatGPT's Educational Potential in Lung Cancer Radiotherapy From Clinician and Patient Perspectives: Content Quality and Readability Analysis.

JMIR Cancer. 2025-8-13

[9]
Assessing the Role of Large Language Models Between ChatGPT and DeepSeek in Asthma Education for Bilingual Individuals: Comparative Study.

JMIR Med Inform. 2025-8-13

[10]
Clinical Performance and Communication Skills of ChatGPT Versus Physicians in Emergency Medicine: Simulated Patient Study.

JMIR Med Inform. 2025-7-17

本文引用的文献

[1]
Large Language Models in Healthcare and Medical Applications: A Review.

Bioengineering (Basel). 2025-6-10

[2]
Large Language Models in Medicine: Applications, Challenges, and Future Directions.

Int J Med Sci. 2025-5-31

[3]
A Practical Guide to the Utilization of ChatGPT in the Emergency Department: A Systematic Review of Current Applications, Future Directions, and Limitations.

Cureus. 2025-4-6

[4]
Novel AI applications in systematic review: GPT-4 assisted data extraction, analysis, review of bias.

BMJ Evid Based Med. 2025-4-8

[5]
A Review of Large Language Models in Medical Education, Clinical Decision Support, and Healthcare Administration.

Healthcare (Basel). 2025-3-10

[6]
ChatGPT and Other Large Language Models in Medical Education - Scoping Literature Review.

Med Sci Educ. 2024-11-13

[7]
Benefits, limits, and risks of ChatGPT in medicine.

Front Artif Intell. 2025-1-30

[8]
Examination of ChatGPT's Performance as a Data Analysis Tool.

Educ Psychol Meas. 2025-1-3

[9]
Use of ChatGPT to Explore Gender and Geographic Disparities in Scientific Peer Review.

J Med Internet Res. 2024-12-9

[10]
Exploring ChatGPT in clinical inquiry: a scoping review of characteristics, applications, challenges, and evaluation.

Ann Med Surg (Lond). 2024-11-8

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索