Ah-Yan Christophe, Boissonnault Ève, Boudier-Revéret Mathieu, Mares Christopher
Department of Physical Medicine and Rehabilitation, University of Montreal, Montreal, QC, Canada.
Department of Physical Medicine and Rehabilitation, Centre Hospitalier de l'Université de Montréal, Montreal, QC, Canada.
J Yeungnam Med Sci. 2025;42:11. doi: 10.12701/jyms.2024.01151. Epub 2024 Nov 29.
The self-management of low back pain (LBP) through patient information interventions offers significant benefits in terms of cost, reduced work absenteeism, and overall healthcare utilization. Using a large language model (LLM), such as ChatGPT (OpenAI) or Copilot (Microsoft), could potentially enhance these outcomes further. Thus, it is important to evaluate the LLMs ChatGPT and Copilot in providing medical advice for LBP and assessing the impact of clinical context on the quality of responses.
This was a qualitative comparative observational study. It was conducted within the Department of Physical Medicine and Rehabilitation, University of Montreal in Montreal, QC, Canada. ChatGPT and Copilot were used to answer 27 common questions related to LBP, with and without a specific clinical context. The responses were evaluated by physiatrists for validity, safety, and usefulness using a 4-point Likert scale (4, most favorable).
Both ChatGPT and Copilot demonstrated good performance across all measures. Validity scores were 3.33 for ChatGPT and 3.18 for Copilot, safety scores were 3.19 for ChatGPT and 3.13 for Copilot, and usefulness scores were 3.60 for ChatGPT and 3.57 for Copilot. The inclusion of clinical context did not significantly change the results.
LLMs, such as ChatGPT and Copilot, can provide reliable medical advice on LBP, irrespective of the detailed clinical context, supporting their potential to aid in patient self-management.
通过患者信息干预进行腰痛(LBP)的自我管理在成本、减少旷工以及整体医疗保健利用方面具有显著益处。使用大型语言模型(LLM),如ChatGPT(OpenAI)或Copilot(微软),可能会进一步提升这些效果。因此,评估ChatGPT和Copilot等大型语言模型在提供腰痛医疗建议以及评估临床背景对回答质量的影响方面很重要。
这是一项定性比较观察性研究。该研究在加拿大魁北克省蒙特利尔市蒙特利尔大学物理医学与康复系进行。使用ChatGPT和Copilot回答27个与腰痛相关的常见问题,有无特定临床背景。物理治疗师使用4点李克特量表(4表示最有利)对回答的有效性、安全性和实用性进行评估。
ChatGPT和Copilot在所有指标上均表现良好。ChatGPT的有效性得分是3.33,Copilot的有效性得分是3.18;ChatGPT的安全性得分是3.19,Copilot的安全性得分是3.13;ChatGPT的实用性得分是3.60,Copilot的实用性得分是3.57。纳入临床背景并没有显著改变结果。
ChatGPT和Copilot等大型语言模型可以提供关于腰痛的可靠医疗建议,无论详细的临床背景如何,这支持了它们在帮助患者自我管理方面的潜力。