Department of Otorhinolaryngology-Head and Neck Surgery and Audiology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark.
Acta Otolaryngol. 2023 Sep;143(9):779-782. doi: 10.1080/00016489.2023.2254809. Epub 2023 Sep 11.
A high number of patients seek health information online, and large language models (LLMs) may produce a rising amount of it.
This study evaluates the performance regarding health information provided by ChatGPT, a LLM developed by OpenAI, focusing on its utility as a source for otolaryngology-related patient information.
A variety of doctors from a tertiary otorhinolaryngology department used a Likert scale to assess the chatbot's responses in terms of accuracy, relevance, and depth. The responses were also evaluated by ChatGPT.
The composite mean of the three categories was 3.41, with the highest performance noted in the relevance category (mean = 3.71) when evaluated by the respondents. The accuracy and depth categories yielded mean scores of 3.51 and 3.00, respectively. All the categories were rated as 5 when evaluated by ChatGPT.
Despite its potential in providing relevant and accurate medical information, the chatbot's responses lacked depth and were found to potentially perpetuate biases due to its training on publicly available text. In conclusion, while LLMs show promise in healthcare, further refinement is necessary to enhance response depth and mitigate potential biases.
大量患者在线寻求健康信息,大型语言模型(LLM)可能会产生越来越多的信息。
本研究评估了由 OpenAI 开发的 LLM ChatGPT 提供的健康信息的性能,重点关注其作为耳鼻喉科相关患者信息来源的效用。
来自三级耳鼻喉科的各种医生使用李克特量表评估聊天机器人在准确性、相关性和深度方面的反应。ChatGPT 还对回复进行了评估。
三个类别的综合平均值为 3.41,受访者评估时相关性类别表现最佳(平均值为 3.71)。准确性和深度类别的平均得分为 3.51 和 3.00。ChatGPT 评估时所有类别均评为 5。
尽管聊天机器人有提供相关且准确的医疗信息的潜力,但由于其在公开文本上的训练,其回复缺乏深度,并且可能存在潜在的偏见。总之,虽然大型语言模型在医疗保健领域显示出前景,但需要进一步改进以增强响应深度并减轻潜在偏见。