文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

使用RaschOnline评估ChatGPT回答多项选择题的能力:观察性研究。

Assessing ChatGPT's Capability for Multiple Choice Questions Using RaschOnline: Observational Study.

作者信息

Chow Julie Chi, Cheng Teng Yun, Chien Tsair-Wei, Chou Willy

机构信息

Department of Pediatrics, Chi Mei Medical Center, Tainan, Taiwan.

Department of Pediatrics, School of Medicine, College of Medicine, Chung Shan Medical University, Taichung, Taiwan.

出版信息

JMIR Form Res. 2024 Aug 8;8:e46800. doi: 10.2196/46800.


DOI:10.2196/46800
PMID:39115919
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11346125/
Abstract

BACKGROUND: ChatGPT (OpenAI), a state-of-the-art large language model, has exhibited remarkable performance in various specialized applications. Despite the growing popularity and efficacy of artificial intelligence, there is a scarcity of studies that assess ChatGPT's competence in addressing multiple-choice questions (MCQs) using KIDMAP of Rasch analysis-a website tool used to evaluate ChatGPT's performance in MCQ answering. OBJECTIVE: This study aims to (1) showcase the utility of the website (Rasch analysis, specifically RaschOnline), and (2) determine the grade achieved by ChatGPT when compared to a normal sample. METHODS: The capability of ChatGPT was evaluated using 10 items from the English tests conducted for Taiwan college entrance examinations in 2023. Under a Rasch model, 300 simulated students with normal distributions were simulated to compete with ChatGPT's responses. RaschOnline was used to generate 5 visual presentations, including item difficulties, differential item functioning, item characteristic curve, Wright map, and KIDMAP, to address the research objectives. RESULTS: The findings revealed the following: (1) the difficulty of the 10 items increased in a monotonous pattern from easier to harder, represented by logits (-2.43, -1.78, -1.48, -0.64, -0.1, 0.33, 0.59, 1.34, 1.7, and 2.47); (2) evidence of differential item functioning was observed between gender groups for item 5 (P=.04); (3) item 5 displayed a good fit to the Rasch model (P=.61); (4) all items demonstrated a satisfactory fit to the Rasch model, indicated by Infit mean square errors below the threshold of 1.5; (5) no significant difference was found in the measures obtained between gender groups (P=.83); (6) a significant difference was observed among ability grades (P<.001); and (7) ChatGPT's capability was graded as A, surpassing grades B to E. CONCLUSIONS: By using RaschOnline, this study provides evidence that ChatGPT possesses the ability to achieve a grade A when compared to a normal sample. It exhibits excellent proficiency in answering MCQs from the English tests conducted in 2023 for the Taiwan college entrance examinations.

摘要

背景:ChatGPT(OpenAI)是一种先进的大型语言模型,在各种专业应用中表现出色。尽管人工智能越来越受欢迎且功效显著,但缺乏使用Rasch分析的KIDMAP(一种用于评估ChatGPT在回答多项选择题方面表现的网站工具)来评估ChatGPT回答多项选择题能力的研究。 目的:本研究旨在(1)展示该网站(Rasch分析,特别是RaschOnline)的实用性,以及(2)确定ChatGPT与正常样本相比所达到的成绩等级。 方法:使用2023年台湾大学入学考试英语测试中的10道题目评估ChatGPT的能力。在Rasch模型下,模拟300名具有正态分布的学生与ChatGPT的回答进行竞争。使用RaschOnline生成5种可视化展示,包括题目难度、题目差异功能、题目特征曲线、赖特图和KIDMAP,以实现研究目标。 结果:研究结果显示如下:(1)10道题目的难度从易到难呈单调递增模式,以对数单位表示为(-2.43、-1.78、-1.48、-0.64、-0.1、0.33、0.59、1.34、1.7和2.47);(2)第5题在不同性别组之间观察到题目差异功能的证据(P = 0.04);(3)第5题与Rasch模型拟合良好(P = 0.61);(4)所有题目均显示与Rasch模型拟合良好,通过拟合均方误差低于1.5的阈值表明;(5)不同性别组之间获得的测量值没有显著差异(P = 0.83);(6)能力等级之间存在显著差异(P < 0.001);(7)ChatGPT的能力被评为A,超过了B至E等级。 结论:通过使用RaschOnline,本研究提供了证据表明,与正常样本相比,ChatGPT具有获得A级成绩的能力。它在回答2023年台湾大学入学考试英语测试中的多项选择题方面表现出卓越的水平。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c486/11346125/80679cdb41d7/formative_v8i1e46800_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c486/11346125/6a9c6b57afd6/formative_v8i1e46800_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c486/11346125/a20b40aa64da/formative_v8i1e46800_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c486/11346125/dec8bd2ecb87/formative_v8i1e46800_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c486/11346125/0b71d7e2204a/formative_v8i1e46800_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c486/11346125/523f7c256608/formative_v8i1e46800_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c486/11346125/80679cdb41d7/formative_v8i1e46800_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c486/11346125/6a9c6b57afd6/formative_v8i1e46800_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c486/11346125/a20b40aa64da/formative_v8i1e46800_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c486/11346125/dec8bd2ecb87/formative_v8i1e46800_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c486/11346125/0b71d7e2204a/formative_v8i1e46800_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c486/11346125/523f7c256608/formative_v8i1e46800_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c486/11346125/80679cdb41d7/formative_v8i1e46800_fig6.jpg

相似文献

[1]
Assessing ChatGPT's Capability for Multiple Choice Questions Using RaschOnline: Observational Study.

JMIR Form Res. 2024-8-8

[2]
Assessing ChatGPT's capacity for clinical decision support in pediatrics: A comparative study with pediatricians using KIDMAP of Rasch analysis.

Medicine (Baltimore). 2023-6-23

[3]
ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.

Front Med (Lausanne). 2023-12-13

[4]
Integrating ChatGPT in Orthopedic Education for Medical Undergraduates: Randomized Controlled Trial.

J Med Internet Res. 2024-8-20

[5]
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.

J Med Internet Res. 2024-7-25

[6]
Assessing question characteristic influences on ChatGPT's performance and response-explanation consistency: Insights from Taiwan's Nursing Licensing Exam.

Int J Nurs Stud. 2024-5

[7]
How does ChatGPT-4 preform on non-English national medical licensing examination? An evaluation in Chinese language.

PLOS Digit Health. 2023-12-1

[8]
Appraisal of ChatGPT's Aptitude for Medical Education: Comparative Analysis With Third-Year Medical Students in a Pulmonology Examination.

JMIR Med Educ. 2024-7-23

[9]
Evaluating ChatGPT's effectiveness and tendencies in Japanese internal medicine.

J Eval Clin Pract. 2024-9

[10]
Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.

JMIR Med Educ. 2024-2-9

引用本文的文献

[1]
Enhancing English abstract quality for non-English speaking authors using ChatGPT: A comparative study of Taiwan, Japan, China, and South Korea with slope graphs.

Medicine (Baltimore). 2024-10-4

本文引用的文献

[1]
ChatGPT in healthcare: A taxonomy and systematic review.

Comput Methods Programs Biomed. 2024-3

[2]
Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study.

JMIR Med Educ. 2023-6-29

[3]
Evaluating GPT as an Adjunct for Radiologic Decision Making: GPT-4 Versus GPT-3.5 in a Breast Imaging Pilot.

J Am Coll Radiol. 2023-10

[4]
Assessing ChatGPT's capacity for clinical decision support in pediatrics: A comparative study with pediatricians using KIDMAP of Rasch analysis.

Medicine (Baltimore). 2023-6-23

[5]
Trialling a Large Language Model (ChatGPT) in General Practice With the Applied Knowledge Test: Observational Study Demonstrating Opportunities and Limitations in Primary Care.

JMIR Med Educ. 2023-4-21

[6]
Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine.

N Engl J Med. 2023-3-30

[7]
ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns.

Healthcare (Basel). 2023-3-19

[8]
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.

PLOS Digit Health. 2023-2-9

[9]
ChatGPT passing USMLE shines a spotlight on the flaws of medical education.

PLOS Digit Health. 2023-2-9

[10]
The future of medical education and research: Is ChatGPT a blessing or blight in disguise?

Med Educ Online. 2023-12

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索