Directorate of Graduate Studies, The Aga Khan University, Karachi, Pakistan.
Department of Science of Dental Materials, Hamdard College of Medicine & Dentistry, Hamdard University, Karachi, Pakistan.
J Coll Physicians Surg Pak. 2024 May;34(5):595-599. doi: 10.29271/jcpsp.2024.05.595.
To analyse and compare the assessment and grading of human-written and machine-written formative essays.
Quasi-experimental, qualitative cross-sectional study. Place and Duration of the Study: Department of Science of Dental Materials, Hamdard College of Medicine & Dentistry, Hamdard University, Karachi, from February to April 2023.
Ten short formative essays of final-year dental students were manually assessed and graded. These essays were then graded using ChatGPT version 3.5. The chatbot responses and prompts were recorded and matched with manually graded essays. Qualitative analysis of the chatbot responses was then performed.
Four different prompts were given to the artificial intelligence (AI) driven platform of ChatGPT to grade the summative essays. These were the chatbot's initial responses without grading, the chatbot's response to grading against criteria, the chatbot's response to criteria-wise grading, and the chatbot's response to questions for the difference in grading. Based on the results, four innovative ways of using AI and machine learning (ML) have been proposed for medical educators: Automated grading, content analysis, plagiarism detection, and formative assessment. ChatGPT provided a comprehensive report with feedback on writing skills, as opposed to manual grading of essays.
The chatbot's responses were fascinating and thought-provoking. AI and ML technologies can potentially supplement human grading in the assessment of essays. Medical educators need to embrace AI and ML technology to enhance the standards and quality of medical education, particularly when assessing long and short essay-type questions. Further empirical research and evaluation are needed to confirm their effectiveness.
Machine learning, Artificial intelligence, Essays, ChatGPT, Formative assessment.
分析和比较人工撰写和机器撰写的形成性论文的评估和分级。
准实验、定性的横断研究。地点和研究时间:哈马丹学院牙科学院,哈马丹大学,卡拉奇,2023 年 2 月至 4 月。
对 10 篇牙科专业最后一年学生的简短形成性论文进行人工评估和评分。然后使用 ChatGPT 版本 3.5 对这些论文进行评分。记录聊天机器人的回答和提示,并与手动评分的论文进行匹配。然后对聊天机器人的回答进行定性分析。
向 ChatGPT 的人工智能(AI)驱动平台给出了四个不同的提示来对总结性论文进行评分。这些提示分别是:聊天机器人初始的无分级回答、根据标准对聊天机器人的回答进行分级、根据标准对聊天机器人的回答进行分级、以及关于分级差异的聊天机器人回答问题。根据结果,提出了医疗教育工作者使用 AI 和机器学习(ML)的四种创新方式:自动评分、内容分析、剽窃检测和形成性评估。ChatGPT 提供了一份全面的报告,包括对写作技巧的反馈,而不是对论文的人工评分。
聊天机器人的回答令人着迷和深思。AI 和 ML 技术可以潜在地补充论文评估中的人工评分。医疗教育工作者需要接受 AI 和 ML 技术,以提高医学教育的标准和质量,特别是在评估长文和短文类型的问题时。需要进一步进行实证研究和评估以确认其有效性。
机器学习、人工智能、论文、ChatGPT、形成性评估。