ChatGPT与睡眠障碍专家对常见睡眠问题的回答：专家和非专业人士的评分

ChatGPT vs. sleep disorder specialist responses to common sleep queries: Ratings by experts and laypeople.

作者信息

Kim Jiyoung, Lee Seo-Young, Kim Jee Hyun, Shin Dong-Hyeon, Oh Eun Hye, Kim Jin A, Cho Jae Wook

机构信息

Department of Neurology and Sleep Disorder Center, Bio Medical Research Institute, Pusan National University School of Medicine, Pusan National University Hospital, Busan, South Korea.

Department of Neurology, School of Medicine, Kangwon National University, Chuncheon, South Korea.

出版信息

Sleep Health. 2024 Dec;10(6):665-670. doi: 10.1016/j.sleh.2024.08.011. Epub 2024 Sep 21.

DOI:10.1016/j.sleh.2024.08.011

PMID:39307579

Abstract

BACKGROUND

Many individuals use the Internet, including generative artificial intelligence like ChatGPT, for sleep-related information before consulting medical professionals. This study compared responses from sleep disorder specialists and ChatGPT to common sleep queries, with experts and laypersons evaluating the responses' accuracy and clarity.

METHODS

We assessed responses from sleep medicine specialists and ChatGPT-4 to 140 sleep-related questions from the Korean Sleep Research Society's website. In a blinded study design, sleep disorder experts and laypersons rated the medical helpfulness, emotional supportiveness, and sentence comprehensibility of the responses on a 1-5 scale.

RESULTS

Laypersons rated ChatGPT higher for medical helpfulness (3.79 ± 0.90 vs. 3.44 ± 0.99, p < .001), emotional supportiveness (3.48 ± 0.79 vs. 3.12 ± 0.98, p < .001), and sentence comprehensibility (4.24 ± 0.79 vs. 4.14 ± 0.96, p = .028). Experts also rated ChatGPT higher for emotional supportiveness (3.33 ± 0.62 vs. 3.01 ± 0.67, p < .001) but preferred specialists' responses for sentence comprehensibility (4.15 ± 0.74 vs. 3.94 ± 0.90, p < .001). When it comes to medical helpfulness, the experts rated the specialists' answers slightly higher than the laypersons did (3.70 ± 0.84 vs. 3.63 ± 0.87, p = .109). Experts slightly preferred specialist responses overall (56.0%), while laypersons favored ChatGPT (54.3%; p < .001). ChatGPT's responses were significantly longer (186.76 ± 39.04 vs. 113.16 ± 95.77 words, p < .001).

DISCUSSION

Generative artificial intelligence like ChatGPT may help disseminate sleep-related medical information online. Laypersons appear to prefer ChatGPT's detailed, emotionally supportive responses over those from sleep disorder specialists.

摘要

背景

许多人在咨询医学专业人员之前，会通过互联网，包括使用ChatGPT等生成式人工智能来获取与睡眠相关的信息。本研究比较了睡眠障碍专家和ChatGPT对常见睡眠问题的回答，并让专家和非专业人士评估这些回答的准确性和清晰度。

方法

我们评估了睡眠医学专家和ChatGPT-4对韩国睡眠研究协会网站上140个与睡眠相关问题的回答。在一项双盲研究设计中，睡眠障碍专家和非专业人士对回答的医学帮助性、情感支持性和句子可理解性进行1至5分的评分。

结果

非专业人士对ChatGPT的医学帮助性评分更高（3.79±0.90对3.44±0.99，p<.001）、情感支持性更高（3.48±0.79对3.12±0.98，p<.001）以及句子可理解性更高（4.24±0.79对4.14±0.96，p=.028）。专家对ChatGPT的情感支持性评分也更高（3.33±0.62对3.01±0.67，p<.001），但在句子可理解性方面更倾向于专家的回答（4.15±0.74对3.94±0.90，p<.001）。在医学帮助性方面，专家对专家回答的评分略高于非专业人士（3.70±0.84对3.63±0.87，p=.109）。总体而言，专家略微更倾向于专家的回答（56.0%），而非专业人士更青睐ChatGPT（54.3%；p<.001）。ChatGPT的回答明显更长（186.76±39.04对113.16±95.77个单词，p<.001）。