评估生成式对话人工智能在破除睡眠健康误区方面的准确性：采用专家分析的混合方法比较研究

Assessing the Accuracy of Generative Conversational Artificial Intelligence in Debunking Sleep Health Myths: Mixed Methods Comparative Study With Expert Analysis.

作者信息

Bragazzi Nicola Luigi, Garbarino Sergio

机构信息

Human Nutrition Unit, Department of Food and Drugs, University of Parma, Parma, Italy.

Department of Neuroscience, Rehabilitation, Ophthalmology, Genetics and Maternal/Child Sciences, University of Genoa, Genoa, Italy.

出版信息

JMIR Form Res. 2024 Apr 16;8:e55762. doi: 10.2196/55762.

DOI:10.2196/55762

PMID:38501898

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11061787/

Abstract

BACKGROUND

Adequate sleep is essential for maintaining individual and public health, positively affecting cognition and well-being, and reducing chronic disease risks. It plays a significant role in driving the economy, public safety, and managing health care costs. Digital tools, including websites, sleep trackers, and apps, are key in promoting sleep health education. Conversational artificial intelligence (AI) such as ChatGPT (OpenAI, Microsoft Corp) offers accessible, personalized advice on sleep health but raises concerns about potential misinformation. This underscores the importance of ensuring that AI-driven sleep health information is accurate, given its significant impact on individual and public health, and the spread of sleep-related myths.

OBJECTIVE

This study aims to examine ChatGPT's capability to debunk sleep-related disbeliefs.

METHODS

A mixed methods design was leveraged. ChatGPT categorized 20 sleep-related myths identified by 10 sleep experts and rated them in terms of falseness and public health significance, on a 5-point Likert scale. Sensitivity, positive predictive value, and interrater agreement were also calculated. A qualitative comparative analysis was also conducted.

RESULTS

ChatGPT labeled a significant portion (n=17, 85%) of the statements as "false" (n=9, 45%) or "generally false" (n=8, 40%), with varying accuracy across different domains. For instance, it correctly identified most myths about "sleep timing," "sleep duration," and "behaviors during sleep," while it had varying degrees of success with other categories such as "pre-sleep behaviors" and "brain function and sleep." ChatGPT's assessment of the degree of falseness and public health significance, on the 5-point Likert scale, revealed an average score of 3.45 (SD 0.87) and 3.15 (SD 0.99), respectively, indicating a good level of accuracy in identifying the falseness of statements and a good understanding of their impact on public health. The AI-based tool showed a sensitivity of 85% and a positive predictive value of 100%. Overall, this indicates that when ChatGPT labels a statement as false, it is highly reliable, but it may miss identifying some false statements. When comparing with expert ratings, high intraclass correlation coefficients (ICCs) between ChatGPT's appraisals and expert opinions could be found, suggesting that the AI's ratings were generally aligned with expert views on falseness (ICC=.83, P<.001) and public health significance (ICC=.79, P=.001) of sleep-related myths. Qualitatively, both ChatGPT and sleep experts refuted sleep-related misconceptions. However, ChatGPT adopted a more accessible style and provided a more generalized view, focusing on broad concepts, while experts sometimes used technical jargon, providing evidence-based explanations.

CONCLUSIONS

ChatGPT-4 can accurately address sleep-related queries and debunk sleep-related myths, with a performance comparable to sleep experts, even if, given its limitations, the AI cannot completely replace expert opinions, especially in nuanced and complex fields such as sleep health, but can be a valuable complement in the dissemination of updated information and promotion of healthy behaviors.

摘要

背景

充足的睡眠对于维持个人和公众健康至关重要，对认知和幸福感有积极影响，并能降低慢性病风险。它在推动经济、保障公共安全以及控制医疗成本方面发挥着重要作用。包括网站、睡眠追踪器和应用程序在内的数字工具是促进睡眠健康教育的关键。诸如ChatGPT（OpenAI、微软公司）之类的对话式人工智能能够提供关于睡眠健康的便捷、个性化建议，但也引发了对潜在错误信息的担忧。鉴于人工智能驱动的睡眠健康信息对个人和公众健康以及睡眠相关谣言传播的重大影响，这凸显了确保此类信息准确无误的重要性。

目的

本研究旨在考察ChatGPT揭穿与睡眠相关错误观念的能力。

方法

采用混合方法设计。ChatGPT对10位睡眠专家确定的20条与睡眠相关的谣言进行分类，并按照5级李克特量表对其错误程度和对公众健康的重要性进行评分。还计算了敏感性、阳性预测值和评分者间一致性。同时进行了定性比较分析。

结果

ChatGPT将很大一部分（n = 17，85%）陈述标记为“错误”（n = 9，45%）或“一般错误”（n = 8，40%），不同领域的准确性各异。例如，它正确识别了大多数关于“睡眠时间”“睡眠时长”和“睡眠期间行为”的谣言，而在“睡前行为”和“脑功能与睡眠”等其他类别上的成功率则有所不同。ChatGPT在5级李克特量表上对错误程度和对公众健康重要性的评估显示，平均得分分别为3.45（标准差0.87）和3.15（标准差0.99），表明在识别陈述的错误性方面具有较高的准确性水平，并且对其对公众健康的影响有较好的理解。基于人工智能的工具显示敏感性为85%，阳性预测值为100%。总体而言，这表明当ChatGPT将一条陈述标记为错误时，它具有高度可靠性，但可能会遗漏识别一些错误陈述。与专家评分相比，ChatGPT的评估与专家意见之间存在较高的组内相关系数（ICC），这表明人工智能对与睡眠相关谣言的错误性（ICC = 0.83，P < 0.001）和对公众健康重要性（ICC = 0.79，P = 0.001）的评分总体上与专家观点一致。定性分析方面，ChatGPT和睡眠专家都驳斥了与睡眠相关的误解。然而，ChatGPT采用了更通俗易懂的风格，提供了更宽泛的观点，侧重于宽泛概念，而专家有时会使用专业术语，提供基于证据的解释。

结论

ChatGPT-4能够准确回答与睡眠相关的问题并揭穿与睡眠相关的谣言，其表现与睡眠专家相当。即便鉴于其局限性，人工智能无法完全取代专家意见，尤其是在睡眠健康等细微复杂的领域，但在传播最新信息和促进健康行为方面，它可以成为有价值的补充。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc10/11061787/c25b38cf41d2/formative_v8i1e55762_fig1.jpg

相似文献

Assessing the Accuracy of Generative Conversational Artificial Intelligence in Debunking Sleep Health Myths: Mixed Methods Comparative Study With Expert Analysis.评估生成式对话人工智能在破除睡眠健康误区方面的准确性：采用专家分析的混合方法比较研究

JMIR Form Res. 2024 Apr 16;8:e55762. doi: 10.2196/55762.

Evaluating the effectiveness of artificial intelligence-based tools in detecting and understanding sleep health misinformation: Comparative analysis using Google Bard and OpenAI ChatGPT-4.评估基于人工智能的工具在检测和理解睡眠健康错误信息方面的有效性：使用 Google Bard 和 OpenAI ChatGPT-4 的比较分析。

J Sleep Res. 2024 Dec;33(6):e14210. doi: 10.1111/jsr.14210. Epub 2024 Apr 5.

Fact Check: Assessing the Response of ChatGPT to Alzheimer's Disease Statements with Varying Degrees of Misinformation.事实核查：评估ChatGPT对带有不同程度错误信息的阿尔茨海默病声明的回应。

medRxiv. 2023 Sep 7:2023.09.04.23294917. doi: 10.1101/2023.09.04.23294917.

Debunking Palliative Care Myths: Assessing the Performance of Artificial Intelligence Chatbots (ChatGPT vs. Google Gemini).揭穿姑息治疗的神话：评估人工智能聊天机器人的表现（ChatGPT与谷歌Gemini对比）

Indian J Palliat Care. 2024 Jul-Sep;30(3):284-287. doi: 10.25259/IJPC_44_2024. Epub 2024 Aug 9.

Evaluating the Influence of Role-Playing Prompts on ChatGPT's Misinformation Detection Accuracy: Quantitative Study.评估角色扮演提示对 ChatGPT 错误信息检测准确率的影响：定量研究。

JMIR Infodemiology. 2024 Sep 26;4:e60678. doi: 10.2196/60678.

ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.ChatGPT在德国妇产科考试中的表现——为人工智能强化医学教育和临床实践铺平道路。

Front Med (Lausanne). 2023 Dec 13;10:1296615. doi: 10.3389/fmed.2023.1296615. eCollection 2023.

Navigating ChatGPT's alignment with expert consensus on pediatric OSA management.探讨 ChatGPT 在儿科阻塞性睡眠呼吸暂停（OSA）管理方面与专家共识的一致性。

Int J Pediatr Otorhinolaryngol. 2024 Nov;186:112131. doi: 10.1016/j.ijporl.2024.112131. Epub 2024 Oct 15.

Fact Check: Assessing the Response of ChatGPT to Alzheimer's Disease Myths.事实核查：评估 ChatGPT 对阿尔茨海默病谣言的反应。

J Am Med Dir Assoc. 2024 Oct;25(10):105178. doi: 10.1016/j.jamda.2024.105178. Epub 2024 Aug 3.

Comparison of ChatGPT knowledge against 2020 consensus statement on ankyloglossia in children.与 2020 年儿童舌系带过紧共识声明相比，ChatGPT 的知识比较。

Int J Pediatr Otorhinolaryngol. 2024 May;180:111957. doi: 10.1016/j.ijporl.2024.111957. Epub 2024 Apr 16.

Optimizing ChatGPT's Interpretation and Reporting of Delirium Assessment Outcomes: Exploratory Study.优化 ChatGPT 对谵妄评估结果的解释和报告：探索性研究。

JMIR Form Res. 2024 Oct 1;8:e51383. doi: 10.2196/51383.

引用本文的文献

From Narratives to Diagnosis: A Machine Learning Framework for Classifying Sleep Disorders in Aging Populations: The Platform.从叙述到诊断：一种用于对老年人群睡眠障碍进行分类的机器学习框架：该平台

Brain Sci. 2025 Jun 20;15(7):667. doi: 10.3390/brainsci15070667.

Large language models in patient education: a scoping review of applications in medicine.用于患者教育的大语言模型：医学应用的范围综述

Front Med (Lausanne). 2024 Oct 29;11:1477898. doi: 10.3389/fmed.2024.1477898. eCollection 2024.

The Complex Interaction Between Sleep-Related Information, Misinformation, and Sleep Health: Call for Comprehensive Research on Sleep Infodemiology and Infoveillance.睡眠相关信息、错误信息与睡眠健康之间的复杂相互作用：呼吁对睡眠信息流行病学和信息监测进行全面研究。

JMIR Infodemiology. 2024 Dec 13;4:e57748. doi: 10.2196/57748.

Indian J Palliat Care. 2024 Jul-Sep;30(3):284-287. doi: 10.25259/IJPC_44_2024. Epub 2024 Aug 9.

Revolutionizing Sleep Health: The Emergence and Impact of Personalized Sleep Medicine.变革睡眠健康：个性化睡眠医学的兴起与影响

J Pers Med. 2024 Jun 4;14(6):598. doi: 10.3390/jpm14060598.

Toward Clinical Generative AI: Conceptual Framework.迈向临床生成式人工智能：概念框架

JMIR AI. 2024 Jun 7;3:e55957. doi: 10.2196/55957.

本文引用的文献

An introduction to machine learning and generative artificial intelligence for otolaryngologists-head and neck surgeons: a narrative review.耳鼻喉科-头颈外科医师的机器学习和生成式人工智能入门：叙述性综述。

Eur Arch Otorhinolaryngol. 2024 May;281(5):2723-2731. doi: 10.1007/s00405-024-08512-4. Epub 2024 Feb 23.

Performance of artificial intelligence chatbots in sleep medicine certification board exams: ChatGPT versus Google Bard.人工智能聊天机器人在睡眠医学认证委员会考试中的表现：ChatGPT 与 Google Bard 对比。

Eur Arch Otorhinolaryngol. 2024 Apr;281(4):2137-2143. doi: 10.1007/s00405-023-08381-3. Epub 2023 Dec 20.

Systematic review and meta-analysis of AI-based conversational agents for promoting mental health and well-being.基于人工智能的对话代理促进心理健康和幸福的系统评价与荟萃分析。

NPJ Digit Med. 2023 Dec 19;6(1):236. doi: 10.1038/s41746-023-00979-5.

Evaluation of the Current State of Chatbots for Digital Health: Scoping Review.评估数字健康领域的聊天机器人现状：范围综述。

J Med Internet Res. 2023 Dec 19;25:e47217. doi: 10.2196/47217.

ChatGPT: opportunities and risks in the fields of medical care, teaching, and research.ChatGPT：在医疗、教学和研究领域的机遇和风险。

Gac Med Mex. 2023;159(5):372-379. doi: 10.24875/GMM.M23000811.

Guidelines, Consensus Statements, and Standards for the Use of Artificial Intelligence in Medicine: Systematic Review.人工智能在医学中的应用指南、共识声明和标准：系统评价。

J Med Internet Res. 2023 Nov 22;25:e46089. doi: 10.2196/46089.

Chat GPT for the management of obstructive sleep apnea: do we have a polar star?Chat GPT 在阻塞性睡眠呼吸暂停管理中的应用：我们是否有了一颗指路明星？

Eur Arch Otorhinolaryngol. 2024 Apr;281(4):2087-2093. doi: 10.1007/s00405-023-08270-9. Epub 2023 Nov 19.

Artificial intelligence and increasing misinformation.人工智能与日益泛滥的错误信息。

Br J Psychiatry. 2024 Feb;224(2):33-35. doi: 10.1192/bjp.2023.136.

Musings on messages and mediums: How communication and marketing techniques can aid in sleep health promotion.关于信息和媒介的思考：沟通和营销技巧如何促进睡眠健康。

Sleep Health. 2024 Feb;10(1S):S11-S14. doi: 10.1016/j.sleh.2023.08.019. Epub 2023 Sep 18.

Reconsidering sleep perception in insomnia: from misperception to mismeasurement.重新思考失眠中的睡眠感知：从感知错误到测量错误。

J Sleep Res. 2023 Dec;32(6):e14028. doi: 10.1111/jsr.14028. Epub 2023 Sep 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估生成式对话人工智能在破除睡眠健康误区方面的准确性：采用专家分析的混合方法比较研究

Assessing the Accuracy of Generative Conversational Artificial Intelligence in Debunking Sleep Health Myths: Mixed Methods Comparative Study With Expert Analysis.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献