评估ChatGPT回答关于长期意识障碍问题的准确性。

Assessing the Accuracy of ChatGPT in Answering Questions About Prolonged Disorders of Consciousness.

作者信息

Bagnato Sergio, Boccagni Cristina, Bonavita Jacopo

机构信息

Villa Rosa Rehabilitation Hospital, Provincial Agency for Health Services (APSS) of Trento, 38057 Pergine Valsugana, Italy.

出版信息

Brain Sci. 2025 Apr 13;15(4):392. doi: 10.3390/brainsci15040392.

DOI:10.3390/brainsci15040392

PMID:40309865

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12025412/

Abstract

: Prolonged disorders of consciousness (DoC) present complex diagnostic and therapeutic challenges. This study aimed to evaluate the accuracy of two ChatGPT models (ChatGPT 4o and ChatGPT o1) in answering questions about prolonged DoC, framed as if they were posed by a patient's relative. Secondary objectives included comparing performance across languages (English vs. Italian) and assessing whether responses conveyed an empathetic tone. : Fifty-seven open-ended questions reflecting common caregiver concerns were generated in both English and Italian, each categorized into one of three domains: clinical data, instrumental diagnostics, or therapy. Each question contained a background context followed by a specific query and was submitted once to both models. Two reviewers evaluated the responses on a four-point scale, ranging from "incorrect and potentially misleading" to "correct and complete". Discrepancies were resolved by a third reviewer. Accuracy, language differences, empathy, and recommendation to consult a healthcare professional were analyzed using absolute frequencies, percentages, the Mann-Whitney U test, and Chi-squared tests. : A total of 228 responses were analyzed. Both models provided predominantly correct answers (80.7-96.8%), with English responses achieving higher accuracy only for ChatGPT 4o on clinical data. ChatGPT 4o exhibited greater empathy in its responses, whereas ChatGPT o1 more frequently recommended consulting a healthcare professional in Italian. : Both ChatGPT models demonstrated high accuracy in addressing prolonged DoC queries, highlighting their potential usefulness for caregiver support. However, occasional inaccuracies emphasize the importance of verifying chatbot-generated information with professional medical advice.

摘要

长期意识障碍（DoC）带来了复杂的诊断和治疗挑战。本研究旨在评估两个ChatGPT模型（ChatGPT 4o和ChatGPT o1）回答有关长期DoC问题的准确性，这些问题被设定为由患者亲属提出。次要目标包括比较不同语言（英语与意大利语）的表现，并评估回答是否传达了同理心。：用英语和意大利语生成了57个反映护理人员常见担忧的开放式问题，每个问题分为三个领域之一：临床数据、仪器诊断或治疗。每个问题都包含一个背景信息，随后是一个具体问题，并分别提交给两个模型一次。两名评审员以四分制对回答进行评估，范围从“不正确且可能有误导性”到“正确且完整”。分歧由第三名评审员解决。使用绝对频率、百分比、曼-惠特尼U检验和卡方检验分析准确性、语言差异、同理心以及建议咨询医疗保健专业人员的情况。：共分析了228个回答。两个模型提供的主要是正确答案（80.7 - 96.8%），仅在临床数据方面，ChatGPT 4o的英语回答准确性更高。ChatGPT 4o在回答中表现出更强的同理心，而ChatGPT o1在意大利语回答中更频繁地建议咨询医疗保健专业人员。：两个ChatGPT模型在处理长期DoC问题方面都显示出较高的准确性，突出了它们对护理人员支持的潜在有用性。然而，偶尔的不准确强调了用专业医疗建议核实聊天机器人生成信息的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9e2/12025412/48c6b761d004/brainsci-15-00392-g001.jpg

相似文献

Assessing the Accuracy of ChatGPT in Answering Questions About Prolonged Disorders of Consciousness.评估ChatGPT回答关于长期意识障碍问题的准确性。

Brain Sci. 2025 Apr 13;15(4):392. doi: 10.3390/brainsci15040392.

Assessing the Accuracy of Information on Medication Abortion: A Comparative Analysis of ChatGPT and Google Bard AI.评估药物流产信息的准确性：ChatGPT与谷歌巴德人工智能的比较分析

Cureus. 2024 Jan 2;16(1):e51544. doi: 10.7759/cureus.51544. eCollection 2024 Jan.

Evaluating ChatGPT and Google Gemini Performance and Implications in Turkish Dental Education.评估ChatGPT和谷歌Gemini在土耳其牙科教育中的性能及影响

Cureus. 2025 Jan 11;17(1):e77292. doi: 10.7759/cureus.77292. eCollection 2025 Jan.

Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.人工智能模型在风湿病委员会级问题中的比较性能：评估 Google Gemini 和 ChatGPT-4o。

Clin Rheumatol. 2024 Nov;43(11):3507-3513. doi: 10.1007/s10067-024-07154-5. Epub 2024 Sep 28.

Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.ChatGPT 在临床医学研究生入学考试中的表现：调查研究。

JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.

Evaluating the reference accuracy of large language models in radiology: a comparative study across subspecialties.评估大型语言模型在放射学中的参考准确性：一项跨亚专业的比较研究。

Diagn Interv Radiol. 2025 May 12. doi: 10.4274/dir.2025.253101.

Performance of AI-Chatbots to Common Temporomandibular Joint Disorders (TMDs) Patient Queries: Accuracy, Completeness, Reliability and Readability.人工智能聊天机器人对常见颞下颌关节紊乱病（TMDs）患者问题的回答：准确性、完整性、可靠性和可读性。

Orthod Craniofac Res. 2025 May 7. doi: 10.1111/ocr.12939.

Is ChatGPT accurate and reliable in answering questions regarding head and neck cancer?ChatGPT在回答有关头颈癌的问题时准确可靠吗？

Front Oncol. 2023 Dec 1;13:1256459. doi: 10.3389/fonc.2023.1256459. eCollection 2023.

Comparative analysis of ChatGPT-4o mini, ChatGPT-4o and Gemini Advanced in the treatment of postmenopausal osteoporosis.ChatGPT-4o mini、ChatGPT-4o与Gemini Advanced在绝经后骨质疏松症治疗中的对比分析。

BMC Musculoskelet Disord. 2025 Apr 16;26(1):369. doi: 10.1186/s12891-025-08601-3.

Artificial intelligence performance in answering multiple-choice oral pathology questions: a comparative analysis.人工智能在回答口腔病理学选择题方面的表现：一项对比分析。

BMC Oral Health. 2025 Apr 15;25(1):573. doi: 10.1186/s12903-025-05926-2.

本文引用的文献

Use of large language models as clinical decision support tools for management pancreatic adenocarcinoma using National Comprehensive Cancer Network guidelines.使用大语言模型作为依据美国国立综合癌症网络指南管理胰腺腺癌的临床决策支持工具。

Surgery. 2025 Jun;182:109267. doi: 10.1016/j.surg.2025.109267. Epub 2025 Mar 6.

Performance of chatbots in queries concerning fundamental concepts in photochemistry.聊天机器人在光化学基本概念相关查询中的表现。

Photochem Photobiol. 2024 Nov 4. doi: 10.1111/php.14037.

OpenAI o1-Preview vs. ChatGPT in Healthcare: A New Frontier in Medical AI Reasoning.医疗领域中OpenAI的o1-预览版与ChatGPT对比：医学人工智能推理的新前沿

Cureus. 2024 Oct 1;16(10):e70640. doi: 10.7759/cureus.70640. eCollection 2024 Oct.

Performance of ChatGPT 3.5 and 4 as a tool for patient support before and after DBS surgery for Parkinson's disease.ChatGPT 3.5 和 4 在帕金森病 DBS 术前和术后作为患者支持工具的性能。

Neurol Sci. 2024 Dec;45(12):5757-5764. doi: 10.1007/s10072-024-07732-0. Epub 2024 Aug 29.

The performance of large language model-powered chatbots compared to oncology physicians on colorectal cancer queries.与肿瘤内科医生相比，大型语言模型驱动的聊天机器人在结直肠癌相关问题上的表现。

Int J Surg. 2024 Oct 1;110(10):6509-6517. doi: 10.1097/JS9.0000000000001850.

Large language models in psychiatry: Opportunities and challenges.精神医学中的大语言模型：机遇与挑战。

Psychiatry Res. 2024 Sep;339:116026. doi: 10.1016/j.psychres.2024.116026. Epub 2024 Jun 11.

Assessment of a Large Language Model's Responses to Questions and Cases About Glaucoma and Retina Management.评估大型语言模型对青光眼和视网膜管理相关问题和病例的回答。

JAMA Ophthalmol. 2024 Apr 1;142(4):371-375. doi: 10.1001/jamaophthalmol.2023.6917.

The use of ChatGPT in occupational medicine: opportunities and threats.ChatGPT在职业医学中的应用：机遇与威胁。

Ann Occup Environ Med. 2023 Oct 23;35:e42. doi: 10.35371/aoem.2023.35.e42. eCollection 2023.

Accuracy of ChatGPT generated diagnosis from patient's medical history and imaging findings in neuroradiology cases.ChatGPT根据患者病史和影像学检查结果对神经放射学病例进行诊断的准确性。

Neuroradiology. 2024 Jan;66(1):73-79. doi: 10.1007/s00234-023-03252-4. Epub 2023 Nov 23.

Accuracy and Reliability of Chatbot Responses to Physician Questions.聊天机器人对医生提问回答的准确性和可靠性。

JAMA Netw Open. 2023 Oct 2;6(10):e2336483. doi: 10.1001/jamanetworkopen.2023.36483.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估ChatGPT回答关于长期意识障碍问题的准确性。

Assessing the Accuracy of ChatGPT in Answering Questions About Prolonged Disorders of Consciousness.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献