Lim Bryan, Lirios Gabriel, Sakalkale Aditya, Satheakeerthy Shriranshini, Hayes Diana, Yeung Justin M C
Department of Colorectal Surgery, Western Health, Melbourne, Australia.
Department of Surgery, Western Precinct, University of Melbourne, Melbourne, Australia.
ANZ J Surg. 2025 Mar;95(3):464-496. doi: 10.1111/ans.19337. Epub 2024 Dec 2.
Stomas present significant lifestyle and psychological challenges for patients, requiring comprehensive education and support. Current educational methods have limitations in offering relevant information to the patient, highlighting a potential role for artificial intelligence (AI). This study examined the utility of AI in enhancing stoma therapy management following colorectal surgery.
We compared the efficacy of four prominent large language models (LLM)-OpenAI's ChatGPT-3.5 and ChatGPT-4.0, Google's Gemini, and Bing's CoPilot-against a series of metrics to evaluate their suitability as supplementary clinical tools. Through qualitative and quantitative analyses, including readability scores (Flesch-Kincaid, Flesch-Reading Ease, and Coleman-Liau index) and reliability assessments (Likert scale, DISCERN score and QAMAI tool), the study aimed to assess the appropriateness of LLM-generated advice for patients managing stomas.
There are varying degrees of readability and reliability across the evaluated models, with CoPilot and ChatGPT-4 demonstrating superior performance in several key metrics such as readability and comprehensiveness. However, the study underscores the infant stage of LLM technology in clinical applications. All responses required high school to college level education to comprehend comfortably. While the LLMs addressed users' questions directly, the absence of incorporating patient-specific factors such as past medical history generated broad and generic responses rather than offering tailored advice.
The complexity of individual patient conditions can challenge AI systems. The use of LLMs in clinical settings holds promise for improving patient education and stoma management support, but requires careful consideration of the models' capabilities and the context of their use.
造口给患者带来了重大的生活方式和心理挑战,需要全面的教育和支持。当前的教育方法在向患者提供相关信息方面存在局限性,这凸显了人工智能(AI)的潜在作用。本研究探讨了人工智能在改善结直肠手术后造口治疗管理方面的效用。
我们将四个著名的大语言模型(LLM)——OpenAI的ChatGPT-3.5和ChatGPT-4.0、谷歌的Gemini以及必应的CoPilot——的功效与一系列指标进行比较,以评估它们作为辅助临床工具的适用性。通过定性和定量分析,包括可读性分数(弗莱什-金凯德、弗莱什阅读简易度和科尔曼-廖指数)和可靠性评估(李克特量表、辨别分数和QAMAI工具),该研究旨在评估大语言模型生成的建议对造口管理患者的适用性。
在评估的模型中,可读性和可靠性存在不同程度的差异,CoPilot和ChatGPT-4在可读性和全面性等几个关键指标上表现更优。然而,该研究强调了大语言模型技术在临床应用中仍处于初期阶段。所有回复都需要高中到大学水平的教育才能轻松理解。虽然大语言模型直接回答了用户的问题,但由于没有纳入患者的特定因素,如既往病史,导致回复宽泛且通用,而非提供量身定制的建议。
个体患者情况的复杂性可能给人工智能系统带来挑战。在临床环境中使用大语言模型有望改善患者教育和造口管理支持,但需要仔细考虑模型的能力及其使用背景。