大型语言模型 ChatGPT（GPT4）在耳鼻喉科三级科作为各种医生的患者信息来源的有效性。

Validity of the large language model ChatGPT (GPT4) as a patient information source in otolaryngology by a variety of doctors in a tertiary otorhinolaryngology department.

机构信息

Department of Otorhinolaryngology-Head and Neck Surgery and Audiology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark.

出版信息

Acta Otolaryngol. 2023 Sep;143(9):779-782. doi: 10.1080/00016489.2023.2254809. Epub 2023 Sep 11.

DOI:10.1080/00016489.2023.2254809

PMID:37694729

Abstract

BACKGROUND

A high number of patients seek health information online, and large language models (LLMs) may produce a rising amount of it.

AIM

This study evaluates the performance regarding health information provided by ChatGPT, a LLM developed by OpenAI, focusing on its utility as a source for otolaryngology-related patient information.

MATERIAL AND METHOD

A variety of doctors from a tertiary otorhinolaryngology department used a Likert scale to assess the chatbot's responses in terms of accuracy, relevance, and depth. The responses were also evaluated by ChatGPT.

RESULTS

The composite mean of the three categories was 3.41, with the highest performance noted in the relevance category (mean = 3.71) when evaluated by the respondents. The accuracy and depth categories yielded mean scores of 3.51 and 3.00, respectively. All the categories were rated as 5 when evaluated by ChatGPT.

CONCLUSION AND SIGNIFICANCE

Despite its potential in providing relevant and accurate medical information, the chatbot's responses lacked depth and were found to potentially perpetuate biases due to its training on publicly available text. In conclusion, while LLMs show promise in healthcare, further refinement is necessary to enhance response depth and mitigate potential biases.

摘要

背景

大量患者在线寻求健康信息，大型语言模型（LLM）可能会产生越来越多的信息。

目的

本研究评估了由 OpenAI 开发的 LLM ChatGPT 提供的健康信息的性能，重点关注其作为耳鼻喉科相关患者信息来源的效用。

材料与方法

来自三级耳鼻喉科的各种医生使用李克特量表评估聊天机器人在准确性、相关性和深度方面的反应。ChatGPT 还对回复进行了评估。

结果

三个类别的综合平均值为 3.41，受访者评估时相关性类别表现最佳（平均值为 3.71）。准确性和深度类别的平均得分为 3.51 和 3.00。ChatGPT 评估时所有类别均评为 5。

结论和意义

尽管聊天机器人有提供相关且准确的医疗信息的潜力，但由于其在公开文本上的训练，其回复缺乏深度，并且可能存在潜在的偏见。总之，虽然大型语言模型在医疗保健领域显示出前景，但需要进一步改进以增强响应深度并减轻潜在偏见。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

大型语言模型 ChatGPT（GPT4）在耳鼻喉科三级科作为各种医生的患者信息来源的有效性。

Validity of the large language model ChatGPT (GPT4) as a patient information source in otolaryngology by a variety of doctors in a tertiary otorhinolaryngology department.

机构信息

出版信息

BACKGROUND

AIM

MATERIAL AND METHOD

RESULTS

CONCLUSION AND SIGNIFICANCE

背景

目的

材料与方法

结果

结论和意义

相似文献

引用本文的文献

大型语言模型 ChatGPT（GPT4）在耳鼻喉科三级科作为各种医生的患者信息来源的有效性。

Validity of the large language model ChatGPT (GPT4) as a patient information source in otolaryngology by a variety of doctors in a tertiary otorhinolaryngology department.

机构信息

出版信息

BACKGROUND

AIM

MATERIAL AND METHOD

RESULTS

CONCLUSION AND SIGNIFICANCE

背景

目的

材料与方法

结果

结论和意义

相似文献

引用本文的文献