Lim Bryan, Seth Ishith, Cuomo Roberto, Kenney Peter Sinkjær, Ross Richard J, Sofiadellis Foti, Pentangelo Paola, Ceccaroni Alessandra, Alfano Carmine, Rozen Warren Matthew
Department of Plastic Surgery, Peninsula Health, Melbourne, Victoria, 3199, Australia.
Plastic Surgery Unit, Department of Medicine, Surgery and Neuroscience, University of Siena, Siena, Italy.
Aesthetic Plast Surg. 2024 Nov;48(22):4712-4724. doi: 10.1007/s00266-024-04157-0. Epub 2024 Jun 19.
Abdominoplasty is a common operation, used for a range of cosmetic and functional issues, often in the context of divarication of recti, significant weight loss, and after pregnancy. Despite this, patient-surgeon communication gaps can hinder informed decision-making. The integration of large language models (LLMs) in healthcare offers potential for enhancing patient information. This study evaluated the feasibility of using LLMs for answering perioperative queries.
This study assessed the efficacy of four leading LLMs-OpenAI's ChatGPT-3.5, Anthropic's Claude, Google's Gemini, and Bing's CoPilot-using fifteen unique prompts. All outputs were evaluated using the Flesch-Kincaid, Flesch Reading Ease score, and Coleman-Liau index for readability assessment. The DISCERN score and a Likert scale were utilized to evaluate quality. Scores were assigned by two plastic surgical residents and then reviewed and discussed until a consensus was reached by five plastic surgeon specialists.
ChatGPT-3.5 required the highest level for comprehension, followed by Gemini, Claude, then CoPilot. Claude provided the most appropriate and actionable advice. In terms of patient-friendliness, CoPilot outperformed the rest, enhancing engagement and information comprehensiveness. ChatGPT-3.5 and Gemini offered adequate, though unremarkable, advice, employing more professional language. CoPilot uniquely included visual aids and was the only model to use hyperlinks, although they were not very helpful and acceptable, and it faced limitations in responding to certain queries.
ChatGPT-3.5, Gemini, Claude, and Bing's CoPilot showcased differences in readability and reliability. LLMs offer unique advantages for patient care but require careful selection. Future research should integrate LLM strengths and address weaknesses for optimal patient education.
This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
腹壁成形术是一种常见手术,用于解决一系列美容和功能问题,常见于腹直肌分离、显著体重减轻以及产后的情况。尽管如此,患者与外科医生之间的沟通差距可能会阻碍明智的决策。大语言模型(LLMs)在医疗保健中的整合为增强患者信息提供了潜力。本研究评估了使用大语言模型回答围手术期问题的可行性。
本研究使用15个独特的提示评估了四种领先的大语言模型——OpenAI的ChatGPT-3.5、Anthropic的Claude、谷歌的Gemini和必应的CoPilot的效果。所有输出均使用弗莱什-金凯德、弗莱什易读性分数和科尔曼-廖指数进行可读性评估。使用DISCERN分数和李克特量表评估质量。分数由两名整形外科住院医师给出,然后由五名整形外科专家进行审核和讨论,直至达成共识。
ChatGPT-3.5需要最高的理解水平,其次是Gemini、Claude,然后是CoPilot。Claude提供了最恰当且可行的建议。在对患者的友好度方面,CoPilot优于其他模型,增强了参与度和信息全面性。ChatGPT-3.5和Gemini提供了足够但并不突出的建议,使用的语言更专业。CoPilot独特地包含视觉辅助工具,是唯一使用超链接的模型,尽管这些超链接不太有用且不太被接受,并且它在回答某些问题时面临局限性。
ChatGPT-3.5、Gemini、Claude和必应的CoPilot在可读性和可靠性方面存在差异。大语言模型为患者护理提供了独特优势,但需要谨慎选择。未来的研究应整合大语言模型的优势并解决其弱点,以实现最佳的患者教育。
证据水平V:本期刊要求作者为每篇文章指定证据水平。有关这些循证医学评级的完整描述,请参考目录或作者在线指南www.springer.com/00266 。