Duran Alpay, Demiröz Anıl, Çörtük Oguz, Ok Bora, Özten Mustafa, Eroğlu Sinem
Aesthet Surg J. 2025 Mar 17;45(4):434-440. doi: 10.1093/asj/sjaf015.
Artificial intelligence-driven technologies offer transformative potential in plastic surgery, spanning preoperative planning, surgical procedures, and postoperative care, with the promise of improved patient outcomes.
To compare the web-based ChatGPT-4o (omni; OpenAI, San Francisco, CA) and Gemini Advanced (Alphabet Inc., Mountain View, CA), focusing on their data upload feature and examining outcomes before and after exposure to continuing medical education (CME) articles, particularly regarding their efficacy relative to human participants.
Participants and large language models (LLMs) completed 22 multiple-choice questions to assess baseline knowledge of CME topics. Initially, both LLMs and participants answered without article access. In incognito mode, the LLMs repeated the tests over 6 days. After accessing the articles, responses from both LLMs and participants were extracted and analyzed.
There was a significant increase in mean scores after the article was read in the resident group, indicating a significant rise. In the LLM groups, the ChatGPT-4o (omni) group showed no significant difference between pre- and postarticle scores, but the Gemini Advanced group demonstrated a significant increase. It can be stated that the ChatGPT-4o and Gemini Advanced groups have higher accuracy means compared with the resident group in both pre- and postarticle periods.
The analysis between human participants and LLMs indicates promising implications for the incorporation of LLMs in medical education. Because these models increase in sophistication, they offer the potential to serve as supplementary tools within traditional learning environments. This could aid in bridging the gap between theoretical knowledge and practical implementation.
人工智能驱动的技术在整形手术中具有变革潜力,涵盖术前规划、手术过程和术后护理,有望改善患者预后。
比较基于网络的ChatGPT-4o(全能版;OpenAI,加利福尼亚州旧金山)和Gemini Advanced(Alphabet公司,加利福尼亚州山景城),重点关注它们的数据上传功能,并检查接触继续医学教育(CME)文章前后的结果,特别是它们相对于人类参与者的功效。
参与者和大语言模型(LLMs)完成了22道多项选择题,以评估对CME主题的基线知识。最初,LLMs和参与者在无法获取文章的情况下作答。在隐身模式下,LLMs在6天内重复进行测试。在获取文章后,提取并分析LLMs和参与者的回答。
住院医师组在阅读文章后平均分数显著提高,表明有显著提升。在LLM组中,ChatGPT-4o(全能版)组文章前后分数无显著差异,但Gemini Advanced组分数显著提高。可以说,ChatGPT-4o组和Gemini Advanced组在文章前后阶段的准确率均值均高于住院医师组。
人类参与者与LLMs之间的分析表明,将LLMs纳入医学教育具有广阔前景。由于这些模型日益复杂,它们有潜力在传统学习环境中作为辅助工具。这有助于弥合理论知识与实际应用之间的差距。