Suppr超能文献

人工智能与试题分析(AI 与 IA 相遇):关于聊天机器人在检测和纠正多项选择题缺陷方面的训练与性能研究

Artificial Intelligence Meets Item Analysis (AI meets IA): A Study of Chatbot Training and Performance in detecting and correcting MCQ Flaws.

作者信息

Sabqat Mashaal, Khan Rehan Ahmed, Jawaid Masood, Sajjad Madiha

机构信息

Mashaal Sabqat, MBBS, MHPE Assistant Professor Medical Education, Islamic International Medical College, Assistant Director Riphah Institute of Assessment, Riphah International University, Islamabad, Pakistan.

Rehan Ahmed Khan, MBBS, MHPE, FCPS, FRCS (Surgery), PhD ME Dean Riphah Institute of Assessment, Riphah International University, Islamabad, Pakistan. Professor of Surgery, Department of Surgery, Riphah International University, Rawalpindi, Pakistan.

出版信息

Pak J Med Sci. 2025 Mar;41(3):652-656. doi: 10.12669/pjms.41.3.11224.

Abstract

OBJECTIVE

To explore the potential of AI-powered chatbots, specifically ChatGPT, in identifying and correcting flaws in MCQs.

METHODS

A three-phase-Interventional study was conducted from February to August 2023 at Riphah International University, Islamabad. In Phase-1, flawed MCQs were selected from the NBME guide and fed into ChatGPT. ChatGPT identified item flaws and suggested corrections. In Phase-2, ChatGPT was trained to detect flaws in MCQs with text data from the NBME item writing guide. In Phase-3, ChatGPT was again tested to detect flaws and correct MCQs. Data were analyzed using SPSS, Version 26 and presented using percentages and McNemar's test with exact conditional method.

RESULTS

ChatGPT could identify and correct flaws such as use of "None of the above," "Grammatical cues," "absolute terms," and "inconsistently presented numerical data." However, it struggled with flaws related to "complicated stems," "long or complex options," and "vague frequency terms." After training, ChatGPT became better at identifying and correcting flaws related to complicated stems and absolute terms. It also struggled with recognizing "nonparallel options," "convergence," and "word repetition," both before and after training. ChatGPT's performance deteriorated during peak hours. The test of significance showed no measurable increase in ChatGPT's efficiency in detecting item flaws (p = 1.00) and correcting them (p = 0.125).

CONCLUSION

AI is revolutionizing industries and improving efficiency, but limitations exist in complex conversations, analysis, accuracy, and error prevention. Ongoing research is vital to unlocking AI's potential, especially in education.

摘要

目的

探讨人工智能驱动的聊天机器人,特别是ChatGPT,在识别和纠正多项选择题中的缺陷方面的潜力。

方法

2023年2月至8月在伊斯兰堡的里法国际大学进行了一项三阶段干预研究。在第一阶段,从美国国家医学考试委员会(NBME)指南中选取有缺陷的多项选择题并输入ChatGPT。ChatGPT识别题目缺陷并提出修正建议。在第二阶段,使用来自NBME题目编写指南的文本数据对ChatGPT进行训练,以检测多项选择题中的缺陷。在第三阶段,再次测试ChatGPT以检测缺陷并纠正多项选择题。使用SPSS 26版对数据进行分析,并以百分比和采用精确条件法的麦克尼马尔检验呈现结果。

结果

ChatGPT能够识别并纠正诸如使用“以上都不是”、“语法线索”、“绝对术语”以及“呈现不一致的数值数据”等缺陷。然而,它在处理与“复杂题干”、“冗长或复杂的选项”以及“模糊的频率术语”相关的缺陷时遇到困难。经过训练后,ChatGPT在识别和纠正与复杂题干和绝对术语相关的缺陷方面表现得更好。在训练前后,它在识别“不平行的选项”、“趋同”和“单词重复”方面也存在困难。ChatGPT在高峰时段的性能会下降。显著性检验表明,ChatGPT在检测题目缺陷(p = 1.00)和纠正缺陷(p = 0.125)方面的效率没有可测量的提高。

结论

人工智能正在彻底改变各行业并提高效率,但在复杂对话、分析、准确性和错误预防方面存在局限性。持续的研究对于释放人工智能的潜力至关重要,尤其是在教育领域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5eaf/11911725/baffa8eecae7/PJMS-41-652-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验