Suppr超能文献

评估GPT-3.5和GPT-4在非创伤性脊髓损伤教育中所提供解释的可读性及答案的可靠性。

Assessing readability of explanations and reliability of answers by GPT-3.5 and GPT-4 in non-traumatic spinal cord injury education.

作者信息

García-Rudolph Alejandro, Sanchez-Pinsach David, Wright Mark Andrew, Opisso Eloy, Vidal Joan

机构信息

Departmento de Investigación e Innovación, Institut Guttmann, Institut Universitari de Neurorehabilitació adscrit a la UAB, Barcelona, Spain.

Universitat Autònoma de Barcelona, Bellaterra (Cerdanyola del Vallès), Bellaterra, Spain.

出版信息

Med Teach. 2025 Jan 20:1-8. doi: 10.1080/0142159X.2024.2430365.

Abstract

PURPOSE

Our study aimed to: i) Assess the readability of textbook explanations using established indexes; ii) Compare these with GPT-4's default explanations, ensuring similar word counts for direct comparisons; iii) Evaluate GPT-4's adaptability by simplifying high-complexity explanations; iv) Determine the reliability of GPT-3.5 and GPT-4 in providing accurate answers.

MATERIAL AND METHODS

We utilized a textbook designed for ABPMR certification. Our analysis covered 50 multiple-choice questions, each with a detailed explanation, focusing on non-traumatic spinal cord injury (NTSCI).

RESULTS

Our analysis revealed statistically significant differences in readability scores, with the textbook achieving 14.5 (SD = 2.5) compared to GPT-4's 17.3 (SD = 1.9), indicating that GPT-4's explanations are generally more complex ( < 0.001). Using the Flesch Reading Ease Score, 86% of GPT-4's explanations fell into the 'Very difficult' category, significantly higher than the textbook's 58% ( = 0.006). GPT-4 successfully demonstrated adaptability by reducing the mean readability score of the top-nine most complex explanations, maintaining the word count. Regarding reliability, GPT-3.5 and GPT-4 scored 84% and 96% respectively, with GPT-4 outperforming GPT-3.5 ( = 0.046).

CONCLUSIONS

Our results confirmed GPT-4's potential in medical education by providing highly accurate yet often complex explanations for NTSCI, which were successfully simplified without losing accuracy.

摘要

目的

我们的研究旨在:i)使用既定指标评估教科书解释的可读性;ii)将这些与GPT-4的默认解释进行比较,确保直接比较时字数相似;iii)通过简化高复杂性解释来评估GPT-4的适应性;iv)确定GPT-3.5和GPT-4在提供准确答案方面的可靠性。

材料与方法

我们使用了一本为ABPMR认证设计的教科书。我们的分析涵盖了50道多项选择题,每道题都有详细解释,重点是非创伤性脊髓损伤(NTSCI)。

结果

我们的分析显示,可读性得分存在统计学上的显著差异,教科书的得分为14.5(标准差 = 2.5),而GPT-4的得分为17.3(标准差 = 1.9),这表明GPT-4的解释通常更复杂(<0.001)。使用弗莱什易读性得分,GPT-4的解释中有86%属于“非常难”类别,显著高于教科书的58%(=0.006)。GPT-4通过降低前九个最复杂解释的平均可读性得分成功展示了适应性,同时保持了字数。在可靠性方面,GPT-3.5和GPT-4的得分分别为84%和96%,GPT-4的表现优于GPT-3.5(=0.046)。

结论

我们的结果证实了GPT-4在医学教育中的潜力,它为NTSCI提供了高度准确但往往复杂的解释,并且在不损失准确性的情况下成功进行了简化。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验