Suppr超能文献

ChatGPT是否“准备好”成为医学本科生的学习工具,它在不同学科中的表现是否相同?ChatGPT在生理学和生物化学的辅导及基于案例的学习问题中的表现比较研究。

Is ChatGPT 'ready' to be a learning tool for medical undergraduates and will it perform equally in different subjects? Comparative study of ChatGPT performance in tutorial and case-based learning questions in physiology and biochemistry.

作者信息

Luke W A Nathasha V, Seow Chong Lee, Ban Kenneth H, Wong Amanda H, Zhi Xiong Chen, Shuh Shing Lee, Taneja Reshma, Samarasekera Dujeepa D, Yap Celestial T

机构信息

Department of Physiology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.

Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.

出版信息

Med Teach. 2024 Nov;46(11):1441-1447. doi: 10.1080/0142159X.2024.2308779. Epub 2024 Jan 31.

Abstract

PURPOSE

Generative AI will become an integral part of education in future. The potential of this technology in different disciplines should be identified to promote effective adoption. This study evaluated the performance of ChatGPT in tutorial and case-based learning questions in physiology and biochemistry for medical undergraduates. Our study mainly focused on the performance of GPT-3.5 version while a subgroup was comparatively assessed on GPT-3.5 and GPT-4 performances.

MATERIALS AND METHODS

Answers were generated in GPT-3.5 for 44 modified essay questions (MEQs) in physiology and 43 MEQs in biochemistry. Each answer was graded by two independent examiners. Subsequently, a subset of 15 questions from each subject were selected to represent different score categories of the GPT-3.5 answers; responses were generated in GPT-4, and graded.

RESULTS

The mean score for physiology answers was 74.7 (SD 25.96). GPT-3.5 demonstrated a statistically significant ( = .009) superior performance in lower-order questions of Bloom's taxonomy in comparison to higher-order questions. Deficiencies in the application of physiological principles in clinical context were noted as a drawback. Scores in biochemistry were relatively lower with a mean score of 59.3 (SD 26.9) for GPT-3.5. There was no statistically significant difference in the scores for higher and lower-order questions of Bloom's taxonomy. The deficiencies highlighted were lack of in-depth explanations and precision. The subset of questions where the GPT-4 and GPT-3.5 were compared demonstrated a better overall performance in GPT-4 responses in both subjects. This difference between the GPT-3.5 and GPT-4 performance was statistically significant in biochemistry but not in physiology.

CONCLUSIONS

The differences in performance across the two versions, GPT-3.5 and GPT-4 across the disciplines are noteworthy. Educators and students should understand the strengths and limitations of this technology in different fields to effectively integrate this technology into teaching and learning.

摘要

目的

生成式人工智能将在未来成为教育的一个组成部分。应确定该技术在不同学科中的潜力,以促进其有效应用。本研究评估了ChatGPT在医学本科生生理学和生物化学的辅导及基于案例的学习问题中的表现。我们的研究主要聚焦于GPT-3.5版本,同时对一个亚组的GPT-3.5和GPT-4表现进行了比较评估。

材料与方法

在GPT-3.5中生成了44道生理学修改短文问题(MEQ)和43道生物化学MEQ的答案。每个答案由两名独立考官评分。随后,从每个学科中选出15道问题的子集,以代表GPT-3.5答案的不同分数类别;在GPT-4中生成回答并评分。

结果

生理学答案的平均分数为74.7(标准差25.96)。与高阶问题相比,GPT-3.5在布鲁姆分类法的低阶问题上表现出统计学上显著(P = 0.009)的优势。在临床情境中应用生理原理方面的不足被视为一个缺点。生物化学的分数相对较低,GPT-3.5的平均分数为59.3(标准差26.9)。布鲁姆分类法的高阶和低阶问题的分数没有统计学上的显著差异。突出的不足之处是缺乏深入解释和精确性。比较GPT-4和GPT-3.5的问题子集显示,两个学科中GPT-4的回答总体表现更好。GPT-3.5和GPT-4表现之间的这种差异在生物化学中具有统计学显著性,但在生理学中不具有。

结论

GPT-3.5和GPT-4这两个版本在不同学科中的表现差异值得关注。教育工作者和学生应了解该技术在不同领域的优势和局限性,以便有效地将该技术融入教学和学习中。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验