Suppr超能文献

Dall-E在手部外科手术中的应用:探索ChatGPT图像生成的效用。

Dall-E in hand surgery: Exploring the utility of ChatGPT image generation.

作者信息

Soroudi Daniel, Rouhani Daniel S, Patel Alap, Sadjadi Ryan, Behnam-Hanona Reta, Oleck Nicholas C, Falade Israel, Piper Merisa, Hansen Scott L

机构信息

University of California San Francisco, School of Medicine, San Francisco, CA, USA.

University of California San Francisco, Department of Surgery, Division of Plastic and Reconstructive Surgery, San Francisco, CA, USA.

出版信息

Surg Open Sci. 2025 May 10;26:64-78. doi: 10.1016/j.sopen.2025.04.012. eCollection 2025 Jun.

Abstract

BACKGROUND

Artificial intelligence (AI) has significantly influenced various medical fields, including plastic surgery. Large language model (LLM) chatbots such as ChatGPT and text-to-image tools like Dall-E and GPT-4o are gaining broader adoption. This study explores the capabilities and limitations of these tools in hand surgery, focusing on their application in patient and medical education.

METHODS

Utilizing Google Trends data, common search terms were identified and queried on ChatGPT-4.5 and ChatGPT-3.5 from the following categories: "Hand Anatomy", "Hand Fracture", "Hand Joint Injury", "Hand Tumor", and "Hand Dislocation". Responses were graded on a 1-5 scale for accuracy and evaluated using the Flesch-Kincaid Grade Level, Patient Education Materials Assessment Tool (PEMAT), and DISCERN instrument. GPT 4o, DALL-E 3, and DALL-E 2 illustrated visual representations of selected ChatGPT responses in each category, which were further evaluated.

RESULTS

ChatGPT-4.5 achieved a DISCERN overall score of 3.80 ± 0.23. Its responses averaged 91.67 ± 0.29 for PEMAT understandability and 54.67 ± 0.55 for actionability. Accuracy was 4.47 ± 0.52, with a Flesch-Kincaid Grade Level of 9.26 ± 1.04. ChatGPT-4.5 consistently outperformed ChatGPT-3.5 across all evaluation metrics. For text-to-image generation, GPT-4o produced more accurate visuals compared to DALL-E 3 and DALL-E 2.

CONCLUSIONS

This study highlights the strengths and limitations of ChatGPT-4.5 and GPT-4o in hand surgery education. While combining accurate text generation with image creation shows promise, these AI tools still need further refinement before widespread clinical adoption.

摘要

背景

人工智能(AI)已对包括整形手术在内的各个医学领域产生了重大影响。诸如ChatGPT之类的大语言模型(LLM)聊天机器人以及像Dall-E和GPT-4o这样的文本到图像工具正得到更广泛的应用。本研究探讨了这些工具在手外科中的能力和局限性,重点关注它们在患者和医学教育中的应用。

方法

利用谷歌趋势数据,确定了常见搜索词,并在ChatGPT-4.5和ChatGPT-3.5上查询了以下类别:“手部解剖学”、“手部骨折”、“手部关节损伤”、“手部肿瘤”和“手部脱位”。对回答的准确性按1-5级评分,并使用弗莱什-金凯德年级水平、患者教育材料评估工具(PEMAT)和辨别工具进行评估。GPT 4o、DALL-E 3和DALL-E 2对每个类别中选定的ChatGPT回答进行了可视化展示,并进一步进行了评估。

结果

ChatGPT-4.5的辨别总体评分为3.80±0.23。其回答的PEMAT可理解性平均为91.67±0.29,可操作性平均为54.67±0.55。准确性为4.47±0.52,弗莱什-金凯德年级水平为9.26±1.04。在所有评估指标上,ChatGPT-4.5始终优于ChatGPT-3.5。对于文本到图像生成,与DALL-E 3和DALL-E 2相比,GPT-4o生成的视觉效果更准确。

结论

本研究突出了ChatGPT-4.5和GPT-4o在手外科教育中的优势和局限性。虽然将准确的文本生成与图像创建相结合显示出了前景,但这些人工智能工具在广泛临床应用之前仍需进一步完善。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/178d/12143819/06c3db93b1ca/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验