• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人工智能与试题分析(AI 与 IA 相遇):关于聊天机器人在检测和纠正多项选择题缺陷方面的训练与性能研究

Artificial Intelligence Meets Item Analysis (AI meets IA): A Study of Chatbot Training and Performance in detecting and correcting MCQ Flaws.

作者信息

Sabqat Mashaal, Khan Rehan Ahmed, Jawaid Masood, Sajjad Madiha

机构信息

Mashaal Sabqat, MBBS, MHPE Assistant Professor Medical Education, Islamic International Medical College, Assistant Director Riphah Institute of Assessment, Riphah International University, Islamabad, Pakistan.

Rehan Ahmed Khan, MBBS, MHPE, FCPS, FRCS (Surgery), PhD ME Dean Riphah Institute of Assessment, Riphah International University, Islamabad, Pakistan. Professor of Surgery, Department of Surgery, Riphah International University, Rawalpindi, Pakistan.

出版信息

Pak J Med Sci. 2025 Mar;41(3):652-656. doi: 10.12669/pjms.41.3.11224.

DOI:10.12669/pjms.41.3.11224
PMID:40103875
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11911725/
Abstract

OBJECTIVE

To explore the potential of AI-powered chatbots, specifically ChatGPT, in identifying and correcting flaws in MCQs.

METHODS

A three-phase-Interventional study was conducted from February to August 2023 at Riphah International University, Islamabad. In Phase-1, flawed MCQs were selected from the NBME guide and fed into ChatGPT. ChatGPT identified item flaws and suggested corrections. In Phase-2, ChatGPT was trained to detect flaws in MCQs with text data from the NBME item writing guide. In Phase-3, ChatGPT was again tested to detect flaws and correct MCQs. Data were analyzed using SPSS, Version 26 and presented using percentages and McNemar's test with exact conditional method.

RESULTS

ChatGPT could identify and correct flaws such as use of "None of the above," "Grammatical cues," "absolute terms," and "inconsistently presented numerical data." However, it struggled with flaws related to "complicated stems," "long or complex options," and "vague frequency terms." After training, ChatGPT became better at identifying and correcting flaws related to complicated stems and absolute terms. It also struggled with recognizing "nonparallel options," "convergence," and "word repetition," both before and after training. ChatGPT's performance deteriorated during peak hours. The test of significance showed no measurable increase in ChatGPT's efficiency in detecting item flaws (p = 1.00) and correcting them (p = 0.125).

CONCLUSION

AI is revolutionizing industries and improving efficiency, but limitations exist in complex conversations, analysis, accuracy, and error prevention. Ongoing research is vital to unlocking AI's potential, especially in education.

摘要

目的

探讨人工智能驱动的聊天机器人,特别是ChatGPT,在识别和纠正多项选择题中的缺陷方面的潜力。

方法

2023年2月至8月在伊斯兰堡的里法国际大学进行了一项三阶段干预研究。在第一阶段,从美国国家医学考试委员会(NBME)指南中选取有缺陷的多项选择题并输入ChatGPT。ChatGPT识别题目缺陷并提出修正建议。在第二阶段,使用来自NBME题目编写指南的文本数据对ChatGPT进行训练,以检测多项选择题中的缺陷。在第三阶段,再次测试ChatGPT以检测缺陷并纠正多项选择题。使用SPSS 26版对数据进行分析,并以百分比和采用精确条件法的麦克尼马尔检验呈现结果。

结果

ChatGPT能够识别并纠正诸如使用“以上都不是”、“语法线索”、“绝对术语”以及“呈现不一致的数值数据”等缺陷。然而,它在处理与“复杂题干”、“冗长或复杂的选项”以及“模糊的频率术语”相关的缺陷时遇到困难。经过训练后,ChatGPT在识别和纠正与复杂题干和绝对术语相关的缺陷方面表现得更好。在训练前后,它在识别“不平行的选项”、“趋同”和“单词重复”方面也存在困难。ChatGPT在高峰时段的性能会下降。显著性检验表明,ChatGPT在检测题目缺陷(p = 1.00)和纠正缺陷(p = 0.125)方面的效率没有可测量的提高。

结论

人工智能正在彻底改变各行业并提高效率,但在复杂对话、分析、准确性和错误预防方面存在局限性。持续的研究对于释放人工智能的潜力至关重要,尤其是在教育领域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5eaf/11911725/baffa8eecae7/PJMS-41-652-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5eaf/11911725/baffa8eecae7/PJMS-41-652-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5eaf/11911725/baffa8eecae7/PJMS-41-652-g001.jpg

相似文献

1
Artificial Intelligence Meets Item Analysis (AI meets IA): A Study of Chatbot Training and Performance in detecting and correcting MCQ Flaws.人工智能与试题分析(AI 与 IA 相遇):关于聊天机器人在检测和纠正多项选择题缺陷方面的训练与性能研究
Pak J Med Sci. 2025 Mar;41(3):652-656. doi: 10.12669/pjms.41.3.11224.
2
Performance of ChatGPT on Nursing Licensure Examinations in the United States and China: Cross-Sectional Study.ChatGPT 在中美护理执照考试中的表现:横断面研究。
JMIR Med Educ. 2024 Oct 3;10:e52746. doi: 10.2196/52746.
3
Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma Examination.人工智能与人类认知:ChatGPT与参加欧洲眼科委员会文凭考试的考生的对比分析
Vision (Basel). 2025 Apr 9;9(2):31. doi: 10.3390/vision9020031.
4
AI versus human-generated multiple-choice questions for medical education: a cohort study in a high-stakes examination.用于医学教育的人工智能生成与人工生成的多项选择题:一项在高风险考试中的队列研究
BMC Med Educ. 2025 Feb 8;25(1):208. doi: 10.1186/s12909-025-06796-6.
5
Integrating ChatGPT in Orthopedic Education for Medical Undergraduates: Randomized Controlled Trial.将 ChatGPT 融入骨科医学本科生教育:随机对照试验。
J Med Internet Res. 2024 Aug 20;26:e57037. doi: 10.2196/57037.
6
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试(USMLE)中的表现如何?大语言模型对医学教育和知识评估的影响。
JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.
7
Is ChatGPT's Knowledge and Interpretative Ability Comparable to First Professional MBBS (Bachelor of Medicine, Bachelor of Surgery) Students of India in Taking a Medical Biochemistry Examination?在参加医学生物化学考试方面,ChatGPT的知识和解释能力能与印度首批医学学士(医学学士、外科学士)专业学生相媲美吗?
Cureus. 2023 Oct 19;15(10):e47329. doi: 10.7759/cureus.47329. eCollection 2023 Oct.
8
Evaluating ChatGPT's Capabilities on Orthopedic Training Examinations: An Analysis of New Image Processing Features.评估ChatGPT在骨科训练考试中的能力:对新图像处理功能的分析
Cureus. 2024 Mar 11;16(3):e55945. doi: 10.7759/cureus.55945. eCollection 2024 Mar.
9
Assessing the Accuracy of Generative Conversational Artificial Intelligence in Debunking Sleep Health Myths: Mixed Methods Comparative Study With Expert Analysis.评估生成式对话人工智能在破除睡眠健康误区方面的准确性:采用专家分析的混合方法比较研究
JMIR Form Res. 2024 Apr 16;8:e55762. doi: 10.2196/55762.
10
Assessing ChatGPT's Capability for Multiple Choice Questions Using RaschOnline: Observational Study.使用RaschOnline评估ChatGPT回答多项选择题的能力:观察性研究。
JMIR Form Res. 2024 Aug 8;8:e46800. doi: 10.2196/46800.

本文引用的文献

1
Exploring the experiences of content experts with item vetting during item bank development.探索内容专家在题库开发过程中进行题目审核的经验。
Pak J Med Sci. 2024 Jul;40(6):1241-1246. doi: 10.12669/pjms.40.6.8664.
2
Prompt Engineering with ChatGPT: A Guide for Academic Writers.《ChatGPT 提示工程:学术写作者指南》
Ann Biomed Eng. 2023 Dec;51(12):2629-2633. doi: 10.1007/s10439-023-03272-4. Epub 2023 Jun 7.
3
A novel use of an artificially intelligent Chatbot and a live, synchronous virtual question-and answer session for fellowship recruitment.
一种新颖的使用人工智能聊天机器人和实时同步虚拟问答会议进行研究员招募的方法。
BMC Med Educ. 2023 Mar 11;23(1):152. doi: 10.1186/s12909-022-03872-z.
4
Knowledge and perception of medical students towards the use of artificial intelligence in healthcare.医学生对人工智能在医疗保健中的应用的认知和看法。
J Pak Med Assoc. 2023 Feb;73(2):448-451. doi: 10.47391/JPMA.5717.
5
Usage of artificial intelligence and virtual reality in medical studies.人工智能和虚拟现实在医学研究中的应用。
Pak J Med Sci. 2022 Mar-Apr;38(4Part-II):777-779. doi: 10.12669/pjms.38.4.5910.
6
The impact of artificial intelligence on clinical education: perceptions of postgraduate trainee doctors in London (UK) and recommendations for trainers.人工智能对临床教育的影响:伦敦(英国)研究生实习医生的看法及对培训师的建议。
BMC Med Educ. 2021 Aug 14;21(1):429. doi: 10.1186/s12909-021-02870-x.
7
Assessment of Global Health Education: The Role of Multiple-Choice Questions.全球健康教育评估:多项选择题的作用。
Front Public Health. 2021 Jul 22;9:640204. doi: 10.3389/fpubh.2021.640204. eCollection 2021.
8
Are We There Yet? - A Systematic Literature Review on Chatbots in Education.我们到了吗?——关于教育领域聊天机器人的系统文献综述
Front Artif Intell. 2021 Jul 15;4:654924. doi: 10.3389/frai.2021.654924. eCollection 2021.
9
The automation of bias in medical Artificial Intelligence (AI): Decoding the past to create a better future.医学人工智能(AI)中的偏见自动化:解码过去,创造更美好的未来。
Artif Intell Med. 2020 Nov;110:101965. doi: 10.1016/j.artmed.2020.101965. Epub 2020 Oct 6.
10
Artificial intelligence in medical education: Are we ready for it?医学教育中的人工智能:我们做好准备了吗?
Pak J Med Sci. 2020 Jul-Aug;36(5):857-859. doi: 10.12669/pjms.36.5.3042.