Soroudi Daniel, Rouhani Daniel S, Patel Alap, Sadjadi Ryan, Behnam-Hanona Reta, Oleck Nicholas C, Falade Israel, Piper Merisa, Hansen Scott L
University of California San Francisco, School of Medicine, San Francisco, CA, USA.
University of California San Francisco, Department of Surgery, Division of Plastic and Reconstructive Surgery, San Francisco, CA, USA.
Surg Open Sci. 2025 May 10;26:64-78. doi: 10.1016/j.sopen.2025.04.012. eCollection 2025 Jun.
BACKGROUND: Artificial intelligence (AI) has significantly influenced various medical fields, including plastic surgery. Large language model (LLM) chatbots such as ChatGPT and text-to-image tools like Dall-E and GPT-4o are gaining broader adoption. This study explores the capabilities and limitations of these tools in hand surgery, focusing on their application in patient and medical education. METHODS: Utilizing Google Trends data, common search terms were identified and queried on ChatGPT-4.5 and ChatGPT-3.5 from the following categories: "Hand Anatomy", "Hand Fracture", "Hand Joint Injury", "Hand Tumor", and "Hand Dislocation". Responses were graded on a 1-5 scale for accuracy and evaluated using the Flesch-Kincaid Grade Level, Patient Education Materials Assessment Tool (PEMAT), and DISCERN instrument. GPT 4o, DALL-E 3, and DALL-E 2 illustrated visual representations of selected ChatGPT responses in each category, which were further evaluated. RESULTS: ChatGPT-4.5 achieved a DISCERN overall score of 3.80 ± 0.23. Its responses averaged 91.67 ± 0.29 for PEMAT understandability and 54.67 ± 0.55 for actionability. Accuracy was 4.47 ± 0.52, with a Flesch-Kincaid Grade Level of 9.26 ± 1.04. ChatGPT-4.5 consistently outperformed ChatGPT-3.5 across all evaluation metrics. For text-to-image generation, GPT-4o produced more accurate visuals compared to DALL-E 3 and DALL-E 2. CONCLUSIONS: This study highlights the strengths and limitations of ChatGPT-4.5 and GPT-4o in hand surgery education. While combining accurate text generation with image creation shows promise, these AI tools still need further refinement before widespread clinical adoption.
背景:人工智能(AI)已对包括整形手术在内的各个医学领域产生了重大影响。诸如ChatGPT之类的大语言模型(LLM)聊天机器人以及像Dall-E和GPT-4o这样的文本到图像工具正得到更广泛的应用。本研究探讨了这些工具在手外科中的能力和局限性,重点关注它们在患者和医学教育中的应用。 方法:利用谷歌趋势数据,确定了常见搜索词,并在ChatGPT-4.5和ChatGPT-3.5上查询了以下类别:“手部解剖学”、“手部骨折”、“手部关节损伤”、“手部肿瘤”和“手部脱位”。对回答的准确性按1-5级评分,并使用弗莱什-金凯德年级水平、患者教育材料评估工具(PEMAT)和辨别工具进行评估。GPT 4o、DALL-E 3和DALL-E 2对每个类别中选定的ChatGPT回答进行了可视化展示,并进一步进行了评估。 结果:ChatGPT-4.5的辨别总体评分为3.80±0.23。其回答的PEMAT可理解性平均为91.67±0.29,可操作性平均为54.67±0.55。准确性为4.47±0.52,弗莱什-金凯德年级水平为9.26±1.04。在所有评估指标上,ChatGPT-4.5始终优于ChatGPT-3.5。对于文本到图像生成,与DALL-E 3和DALL-E 2相比,GPT-4o生成的视觉效果更准确。 结论:本研究突出了ChatGPT-4.5和GPT-4o在手外科教育中的优势和局限性。虽然将准确的文本生成与图像创建相结合显示出了前景,但这些人工智能工具在广泛临床应用之前仍需进一步完善。
Surg Open Sci. 2025-5-10
J Med Internet Res. 2024-8-14
Ophthalmol Ther. 2025-6
Commun Med (Lond). 2025-1-21
Front Med (Lausanne). 2024-10-29
BMC Med Inform Decis Mak. 2024-9-9
J Am Med Inform Assoc. 2024-10-1
Crit Care. 2024-6-10
Med Sci Educ. 2023-11-14