Suppr超能文献

大语言模型的效能及其在妇产科教育中的潜力。

Efficacy of large language models and their potential in Obstetrics and Gynecology education.

作者信息

Eoh Kyung Jin, Kwon Gu Yeun, Lee Eun Jin, Lee JoonHo, Lee Inha, Kim Young Tae, Nam Eun Ji

机构信息

Department of Obstetrics and Gynecology, Yongin Severance Hospital, Yonsei University College of Medicine, Yongin, Korea.

Department of Obstetrics and Gynecology, Institute of Women's Medical Life Science, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea.

出版信息

Obstet Gynecol Sci. 2024 Nov;67(6):550-556. doi: 10.5468/ogs.24211. Epub 2024 Oct 2.

Abstract

OBJECTIVE

The performance of large language models (LLMs) and their potential utility in obstetric and gynecological education are topics of ongoing debate. This study aimed to contribute to this discussion by examining the recent advancements in LLM technology and their transformative potential in artificial intelligence.

METHODS

This study assessed the performance of generative pre-trained transformer (GPT)-3.5 and -4 in understanding clinical information, as well as its potential implications for obstetric and gynecological education. Obstetrics and gynecology residents at three hospitals underwent an annual promotional examination, from which 116 of the 170 questions over 4 years (2020-2023) were analyzed, excluding 54 questions with images. The scores achieved by GPT-3.5, -4, and the 100 residents were compared.

RESULTS

The average scores across all 4 years for GPT-3.5 and -4 were 38.79 (standard deviation [SD], 5.65) and 79.31 (SD, 3.67), respectively. For groups first-year resident, second-year resident, and third-year resident, the cumulative annual average scores were 79.12 (SD, 9.00), 80.95 (SD, 5.86), and 83.60 (SD, 6.82), respectively. No statistically significant differences were observed between the scores of GPT-4.0 and those of the residents. When analyzing questions specific to obstetrics, the average scores for GPT-3.5 and -4.0 were 33.44 (SD, 10.18) and 90.22 (SD, 7.68), respectively.

CONCLUSION

GPT-4 demonstrated exceptional performance in obstetrics, different types of data interpretation, and problem solving, showcasing the potential utility of LLMs in these areas. However, acknowledging the constraints of LLMs is crucial and their utilization should augment human expertise and discernment.

摘要

目的

大语言模型(LLMs)的性能及其在妇产科教育中的潜在效用是当前正在讨论的话题。本研究旨在通过研究LLM技术的最新进展及其在人工智能中的变革潜力,为这一讨论做出贡献。

方法

本研究评估了生成式预训练变换器(GPT)-3.5和-4在理解临床信息方面的性能,以及其对妇产科教育的潜在影响。三家医院的妇产科住院医师参加了年度晋升考试,对2020年至2023年4年间170道题中的116道题进行了分析,排除了54道有图像的题目。比较了GPT-3.5、-4和100名住院医师的得分。

结果

GPT-3.5和-4在所有4年中的平均得分分别为38.79(标准差[SD],5.65)和79.31(SD,3.67)。对于第一年住院医师、第二年住院医师和第三年住院医师组,累积年度平均得分分别为79.12(SD,9.00)、80.95(SD,5.86)和83.60(SD,6.82)。GPT-4.0的得分与住院医师的得分之间未观察到统计学上的显著差异。在分析产科特定问题时,GPT-3.5和-4.0的平均得分分别为33.44(SD,10.18)和90.22(SD,7.68)。

结论

GPT-4在产科、不同类型的数据解释和问题解决方面表现出色,展示了LLMs在这些领域的潜在效用。然而,认识到LLMs的局限性至关重要,其应用应增强人类的专业知识和洞察力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5447/11581811/f2b3a0c9c643/ogs-24211f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验