Bragazzi Nicola Luigi, Garbarino Sergio
Human Nutrition Unit, Department of Food and Drugs, University of Parma, Parma, Italy.
Department of Neuroscience, Rehabilitation, Ophthalmology, Genetics and Maternal/Child Sciences, University of Genoa, Genoa, Italy.
JMIR Form Res. 2024 Apr 16;8:e55762. doi: 10.2196/55762.
Adequate sleep is essential for maintaining individual and public health, positively affecting cognition and well-being, and reducing chronic disease risks. It plays a significant role in driving the economy, public safety, and managing health care costs. Digital tools, including websites, sleep trackers, and apps, are key in promoting sleep health education. Conversational artificial intelligence (AI) such as ChatGPT (OpenAI, Microsoft Corp) offers accessible, personalized advice on sleep health but raises concerns about potential misinformation. This underscores the importance of ensuring that AI-driven sleep health information is accurate, given its significant impact on individual and public health, and the spread of sleep-related myths.
This study aims to examine ChatGPT's capability to debunk sleep-related disbeliefs.
A mixed methods design was leveraged. ChatGPT categorized 20 sleep-related myths identified by 10 sleep experts and rated them in terms of falseness and public health significance, on a 5-point Likert scale. Sensitivity, positive predictive value, and interrater agreement were also calculated. A qualitative comparative analysis was also conducted.
ChatGPT labeled a significant portion (n=17, 85%) of the statements as "false" (n=9, 45%) or "generally false" (n=8, 40%), with varying accuracy across different domains. For instance, it correctly identified most myths about "sleep timing," "sleep duration," and "behaviors during sleep," while it had varying degrees of success with other categories such as "pre-sleep behaviors" and "brain function and sleep." ChatGPT's assessment of the degree of falseness and public health significance, on the 5-point Likert scale, revealed an average score of 3.45 (SD 0.87) and 3.15 (SD 0.99), respectively, indicating a good level of accuracy in identifying the falseness of statements and a good understanding of their impact on public health. The AI-based tool showed a sensitivity of 85% and a positive predictive value of 100%. Overall, this indicates that when ChatGPT labels a statement as false, it is highly reliable, but it may miss identifying some false statements. When comparing with expert ratings, high intraclass correlation coefficients (ICCs) between ChatGPT's appraisals and expert opinions could be found, suggesting that the AI's ratings were generally aligned with expert views on falseness (ICC=.83, P<.001) and public health significance (ICC=.79, P=.001) of sleep-related myths. Qualitatively, both ChatGPT and sleep experts refuted sleep-related misconceptions. However, ChatGPT adopted a more accessible style and provided a more generalized view, focusing on broad concepts, while experts sometimes used technical jargon, providing evidence-based explanations.
ChatGPT-4 can accurately address sleep-related queries and debunk sleep-related myths, with a performance comparable to sleep experts, even if, given its limitations, the AI cannot completely replace expert opinions, especially in nuanced and complex fields such as sleep health, but can be a valuable complement in the dissemination of updated information and promotion of healthy behaviors.
充足的睡眠对于维持个人和公众健康至关重要,对认知和幸福感有积极影响,并能降低慢性病风险。它在推动经济、保障公共安全以及控制医疗成本方面发挥着重要作用。包括网站、睡眠追踪器和应用程序在内的数字工具是促进睡眠健康教育的关键。诸如ChatGPT(OpenAI、微软公司)之类的对话式人工智能能够提供关于睡眠健康的便捷、个性化建议,但也引发了对潜在错误信息的担忧。鉴于人工智能驱动的睡眠健康信息对个人和公众健康以及睡眠相关谣言传播的重大影响,这凸显了确保此类信息准确无误的重要性。
本研究旨在考察ChatGPT揭穿与睡眠相关错误观念的能力。
采用混合方法设计。ChatGPT对10位睡眠专家确定的20条与睡眠相关的谣言进行分类,并按照5级李克特量表对其错误程度和对公众健康的重要性进行评分。还计算了敏感性、阳性预测值和评分者间一致性。同时进行了定性比较分析。
ChatGPT将很大一部分(n = 17,85%)陈述标记为“错误”(n = 9,45%)或“一般错误”(n = 8,40%),不同领域的准确性各异。例如,它正确识别了大多数关于“睡眠时间”“睡眠时长”和“睡眠期间行为”的谣言,而在“睡前行为”和“脑功能与睡眠”等其他类别上的成功率则有所不同。ChatGPT在5级李克特量表上对错误程度和对公众健康重要性的评估显示,平均得分分别为3.45(标准差0.87)和3.15(标准差0.99),表明在识别陈述的错误性方面具有较高的准确性水平,并且对其对公众健康的影响有较好的理解。基于人工智能的工具显示敏感性为85%,阳性预测值为100%。总体而言,这表明当ChatGPT将一条陈述标记为错误时,它具有高度可靠性,但可能会遗漏识别一些错误陈述。与专家评分相比,ChatGPT的评估与专家意见之间存在较高的组内相关系数(ICC),这表明人工智能对与睡眠相关谣言的错误性(ICC = 0.83,P < 0.001)和对公众健康重要性(ICC = 0.79,P = 0.001)的评分总体上与专家观点一致。定性分析方面,ChatGPT和睡眠专家都驳斥了与睡眠相关的误解。然而,ChatGPT采用了更通俗易懂的风格,提供了更宽泛的观点,侧重于宽泛概念,而专家有时会使用专业术语,提供基于证据的解释。
ChatGPT-4能够准确回答与睡眠相关的问题并揭穿与睡眠相关的谣言,其表现与睡眠专家相当。即便鉴于其局限性,人工智能无法完全取代专家意见,尤其是在睡眠健康等细微复杂的领域,但在传播最新信息和促进健康行为方面,它可以成为有价值的补充。