生成式人工智能与人类专业知识：基于案例的合理药物治疗问题生成的比较分析

Generative AI vs. human expertise: a comparative analysis of case-based rational pharmacotherapy question generation.

作者信息

Güvel Muhammed Cihan, Kıyak Yavuz Selim, Varan Hacer Doğan, Sezenöz Burak, Coşkun Özlem, Uluoğlu Canan

机构信息

Department Medical Pharmacology, Gazi University Faculty of Medicine, Ankara, Turkey.

Department of Medical Education and Informatics, Gazi University Faculty of Medicine, Ankara, Turkey.

出版信息

Eur J Clin Pharmacol. 2025 Jun;81(6):875-883. doi: 10.1007/s00228-025-03838-2. Epub 2025 Apr 9.

DOI:10.1007/s00228-025-03838-2

PMID:40205076

Abstract

PURPOSE

This study evaluated the performance of three generative AI models-ChatGPT- 4o, Gemini 1.5 Advanced Pro, and Claude 3.5 Sonnet-in producing case-based rational pharmacology questions compared to expert educators.

METHODS

Using one-shot prompting, 60 questions (20 per model) addressing essential hypertension and type 2 diabetes subjects were generated. A multidisciplinary panel categorized questions by usability (no revisions needed, minor or major revisions required, or unusable). Subsequently, 24 AI-generated and 8 expert-created questions were asked to 103 medical students in a real-world exam setting. Performance metrics, including correct response rate, discrimination index, and identification of nonfunctional distractors, were analyzed.

RESULTS

No statistically significant differences were found between AI-generated and expert-created questions, with mean correct response rates surpassing 50% and discrimination indices consistently equal to or above 0.20. Claude produced the highest proportion of error-free items (12/20), whereas ChatGPT exhibited the fewest unusable items (5/20). Expert revisions required approximately one minute per AI-generated question, representing a substantial efficiency gain over manual question preperation. Nonetheless, 19 out of 60 AI-generated questions were deemed unusable, highlighting the necessity of expert oversight.

CONCLUSION

Large language models can profoundly accelerate the development of high-quality assessment questions in medical education. However, expert review remains critical to address lapses in reliability and validity. A hybrid model, integrating AI-driven efficiencies with rigorous expert validation, may offer an optimal approach for enhancing educational outcomes.

摘要

目的

本研究评估了三种生成式人工智能模型——ChatGPT-4o、Gemini 1.5 Advanced Pro和Claude 3.5 Sonnet——与专家教育工作者相比，在生成基于病例的合理药理学问题方面的表现。

方法

使用一次性提示，生成了60个问题（每个模型20个），涉及原发性高血压和2型糖尿病主题。一个多学科小组根据可用性对问题进行分类（无需修订、需要 minor 或 major 修订，或不可用）。随后，在实际考试环境中，向103名医学生提出了24个由人工智能生成的问题和8个由专家创建的问题。分析了包括正确回答率、区分指数和识别无功能干扰项在内的性能指标。

结果

人工智能生成的问题和专家创建的问题之间未发现统计学上的显著差异，平均正确回答率超过50%，区分指数始终等于或高于0.20。Claude生成的无错误项目比例最高（12/20），而ChatGPT展示的不可用项目最少（5/20）。专家修订每个由人工智能生成的问题大约需要一分钟，与手动编写问题相比，效率有了显著提高。尽管如此，60个由人工智能生成的问题中有19个被认为不可用，这凸显了专家监督的必要性。

结论

大语言模型可以极大地加速医学教育中高质量评估问题的开发。然而，专家评审对于解决可靠性和有效性方面的不足仍然至关重要。将人工智能驱动的效率与严格的专家验证相结合的混合模型，可能为提高教育成果提供最佳方法。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

生成式人工智能与人类专业知识：基于案例的合理药物治疗问题生成的比较分析

Generative AI vs. human expertise: a comparative analysis of case-based rational pharmacotherapy question generation.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

本文引用的文献

生成式人工智能与人类专业知识：基于案例的合理药物治疗问题生成的比较分析

Generative AI vs. human expertise: a comparative analysis of case-based rational pharmacotherapy question generation.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

本文引用的文献