Suppr
超能文献

评估GPT-3.5和GPT-4在非创伤性脊髓损伤教育中所提供解释的可读性及答案的可靠性。

Assessing readability of explanations and reliability of answers by GPT-3.5 and GPT-4 in non-traumatic spinal cord injury education.

作者信息

García-Rudolph Alejandro, Sanchez-Pinsach David, Wright Mark Andrew, Opisso Eloy, Vidal Joan

机构信息

Departmento de Investigación e Innovación, Institut Guttmann, Institut Universitari de Neurorehabilitació adscrit a la UAB, Barcelona, Spain.

Universitat Autònoma de Barcelona, Bellaterra (Cerdanyola del Vallès), Bellaterra, Spain.

出版信息

Med Teach. 2025 Jan 20:1-8. doi: 10.1080/0142159X.2024.2430365.

DOI:10.1080/0142159X.2024.2430365

PMID:39832525

Abstract

PURPOSE

Our study aimed to: i) Assess the readability of textbook explanations using established indexes; ii) Compare these with GPT-4's default explanations, ensuring similar word counts for direct comparisons; iii) Evaluate GPT-4's adaptability by simplifying high-complexity explanations; iv) Determine the reliability of GPT-3.5 and GPT-4 in providing accurate answers.

MATERIAL AND METHODS

We utilized a textbook designed for ABPMR certification. Our analysis covered 50 multiple-choice questions, each with a detailed explanation, focusing on non-traumatic spinal cord injury (NTSCI).

RESULTS

Our analysis revealed statistically significant differences in readability scores, with the textbook achieving 14.5 (SD = 2.5) compared to GPT-4's 17.3 (SD = 1.9), indicating that GPT-4's explanations are generally more complex ( < 0.001). Using the Flesch Reading Ease Score, 86% of GPT-4's explanations fell into the 'Very difficult' category, significantly higher than the textbook's 58% ( = 0.006). GPT-4 successfully demonstrated adaptability by reducing the mean readability score of the top-nine most complex explanations, maintaining the word count. Regarding reliability, GPT-3.5 and GPT-4 scored 84% and 96% respectively, with GPT-4 outperforming GPT-3.5 ( = 0.046).

CONCLUSIONS

Our results confirmed GPT-4's potential in medical education by providing highly accurate yet often complex explanations for NTSCI, which were successfully simplified without losing accuracy.

摘要

目的

我们的研究旨在：i）使用既定指标评估教科书解释的可读性；ii）将这些与GPT-4的默认解释进行比较，确保直接比较时字数相似；iii）通过简化高复杂性解释来评估GPT-4的适应性；iv）确定GPT-3.5和GPT-4在提供准确答案方面的可靠性。

材料与方法

我们使用了一本为ABPMR认证设计的教科书。我们的分析涵盖了50道多项选择题，每道题都有详细解释，重点是非创伤性脊髓损伤（NTSCI）。

结果

我们的分析显示，可读性得分存在统计学上的显著差异，教科书的得分为14.5（标准差 = 2.5），而GPT-4的得分为17.3（标准差 = 1.9），这表明GPT-4的解释通常更复杂（<0.001）。使用弗莱什易读性得分，GPT-4的解释中有86%属于“非常难”类别，显著高于教科书的58%（=0.006）。GPT-4通过降低前九个最复杂解释的平均可读性得分成功展示了适应性，同时保持了字数。在可靠性方面，GPT-3.5和GPT-4的得分分别为84%和96%，GPT-4的表现优于GPT-3.5（=0.046）。