人工智能如何提供关于硬膜下血肿的信息：对ChatGPT、BARD和Perplexity回答的可读性、可靠性和质量评估。

How artificial intelligence can provide information about subdural hematoma: Assessment of readability, reliability, and quality of ChatGPT, BARD, and perplexity responses.

作者信息

Gül Şanser, Erdemir İsmail, Hanci Volkan, Aydoğmuş Evren, Erkoç Yavuz Selim

机构信息

Department of Neurosurgery, Ankara Ataturk Sanatory Education and Research Hospital, Ankara, Turkey.

Department of Anesthesiology and Critical Care, Faculty of Medicine, Dokuz Eylül University, Izmir, Turkey.

出版信息

Medicine (Baltimore). 2024 May 3;103(18):e38009. doi: 10.1097/MD.0000000000038009.

DOI:10.1097/MD.0000000000038009

PMID:38701313

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11062651/

Abstract

Subdural hematoma is defined as blood collection in the subdural space between the dura mater and arachnoid. Subdural hematoma is a condition that neurosurgeons frequently encounter and has acute, subacute and chronic forms. The incidence in adults is reported to be 1.72-20.60/100.000 people annually. Our study aimed to evaluate the quality, reliability and readability of the answers to questions asked to ChatGPT, Bard, and perplexity about "Subdural Hematoma." In this observational and cross-sectional study, we asked ChatGPT, Bard, and perplexity to provide the 100 most frequently asked questions about "Subdural Hematoma" separately. Responses from both chatbots were analyzed separately for readability, quality, reliability and adequacy. When the median readability scores of ChatGPT, Bard, and perplexity answers were compared with the sixth-grade reading level, a statistically significant difference was observed in all formulas (P < .001). All 3 chatbot responses were found to be difficult to read. Bard responses were more readable than ChatGPT's (P < .001) and perplexity's (P < .001) responses for all scores evaluated. Although there were differences between the results of the evaluated calculators, perplexity's answers were determined to be more readable than ChatGPT's answers (P < .05). Bard answers were determined to have the best GQS scores (P < .001). Perplexity responses had the best Journal of American Medical Association and modified DISCERN scores (P < .001). ChatGPT, Bard, and perplexity's current capabilities are inadequate in terms of quality and readability of "Subdural Hematoma" related text content. The readability standard for patient education materials as determined by the American Medical Association, National Institutes of Health, and the United States Department of Health and Human Services is at or below grade 6. The readability levels of the responses of artificial intelligence applications such as ChatGPT, Bard, and perplexity are significantly higher than the recommended 6th grade level.

摘要

硬膜下血肿被定义为硬脑膜和蛛网膜之间硬膜下间隙的血液聚集。硬膜下血肿是神经外科医生经常遇到的一种病症，有急性、亚急性和慢性三种形式。据报道，成年人每年的发病率为1.72 - 20.60/100,000人。我们的研究旨在评估ChatGPT、Bard和Perplexity针对“硬膜下血肿”问题给出答案的质量、可靠性和可读性。在这项观察性横断面研究中，我们分别要求ChatGPT、Bard和Perplexity提供关于“硬膜下血肿”的100个最常见问题。对两个聊天机器人的回复分别进行可读性、质量、可靠性和充分性分析。将ChatGPT、Bard和Perplexity答案的中位数可读性得分与六年级阅读水平进行比较时，在所有公式中均观察到统计学上的显著差异（P < 0.001）。发现所有3个聊天机器人的回复都难以阅读。对于所有评估分数，Bard的回复比ChatGPT的（P < 0.001）和Perplexity的（P < 0.001）回复更具可读性。尽管评估计算器的结果存在差异，但Perplexity的答案被确定比ChatGPT的答案更具可读性（P < 0.05）。Bard的答案被确定具有最佳的GQS分数（P < 0.001）。Perplexity的回复具有最佳的《美国医学会杂志》和改良的DISCERN分数（P < 0.001）。ChatGPT、Bard和Perplexity目前在与“硬膜下血肿”相关文本内容的质量和可读性方面能力不足。美国医学会、国立卫生研究院和美国卫生与公众服务部确定的患者教育材料的可读性标准为六年级及以下水平。ChatGPT、Bard和Perplexity等人工智能应用的回复可读性水平明显高于推荐的六年级水平。

相似文献

How artificial intelligence can provide information about subdural hematoma: Assessment of readability, reliability, and quality of ChatGPT, BARD, and perplexity responses.人工智能如何提供关于硬膜下血肿的信息：对ChatGPT、BARD和Perplexity回答的可读性、可靠性和质量评估。

Medicine (Baltimore). 2024 May 3;103(18):e38009. doi: 10.1097/MD.0000000000038009.

Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.评估 ChatGPT®、BARD®、 Gemini®、Copilot®、Perplexity® 在姑息治疗方面的可读性、可靠性和质量。

Medicine (Baltimore). 2024 Aug 16;103(33):e39305. doi: 10.1097/MD.0000000000039305.

Readability, reliability and quality of responses generated by ChatGPT, gemini, and perplexity for the most frequently asked questions about pain.ChatGPT、Gemini和Perplexity针对最常见疼痛问题生成的回答的可读性、可靠性和质量。

Medicine (Baltimore). 2025 Mar 14;104(11):e41780. doi: 10.1097/MD.0000000000041780.

Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study.评估人工智能聊天机器人对心肺复苏术 100 个最常见查询的回答的易读性、可靠性和质量：一项观察性研究。

Medicine (Baltimore). 2024 May 31;103(22):e38352. doi: 10.1097/MD.0000000000038352.

Assessing the readability, quality and reliability of responses produced by ChatGPT, Gemini, and Perplexity regarding most frequently asked keywords about low back pain.评估ChatGPT、Gemini和Perplexity针对有关腰痛的最常见关键词所给出回答的可读性、质量和可靠性。

PeerJ. 2025 Jan 22;13:e18847. doi: 10.7717/peerj.18847. eCollection 2025.

Appropriateness and readability of Google Bard and ChatGPT-3.5 generated responses for surgical treatment of glaucoma.谷歌巴德和 ChatGPT-3.5 生成的青光眼手术治疗回复的适宜性和可读性。

Rom J Ophthalmol. 2024 Jul-Sep;68(3):243-248. doi: 10.22336/rjo.2024.45.

Readability, quality and accuracy of generative artificial intelligence chatbots for commonly asked questions about labor epidurals: a comparison of ChatGPT and Bard.生成式人工智能聊天机器人针对分娩硬膜外麻醉常见问题的可读性、质量和准确性：ChatGPT与Bard的比较

Int J Obstet Anesth. 2025 Feb;61:104317. doi: 10.1016/j.ijoa.2024.104317. Epub 2024 Dec 20.

Comparative assessment of artificial intelligence chatbots' performance in responding to healthcare professionals' and caregivers' questions about Dravet syndrome.人工智能聊天机器人在回答医疗专业人员和护理人员有关德雷维特综合征问题时的性能比较评估。

Epilepsia Open. 2025 Apr 1. doi: 10.1002/epi4.70022.

Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients.生成式人工智能聊天机器人可能会为患者关于常见血管外科问题提供恰当的信息性回复。

Vascular. 2025 Feb;33(1):229-237. doi: 10.1177/17085381241240550. Epub 2024 Mar 18.

Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.人工智能聊天机器人对输精管切除术相关问题回答的准确性和可读性：公众需谨慎。

Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.

引用本文的文献

To Self-Treat or Not to Self-Treat: Evaluating the Diagnostic, Advisory and Referral Effectiveness of ChatGPT Responses to the Most Common Musculoskeletal Disorders.自我治疗还是不自我治疗：评估ChatGPT对最常见肌肉骨骼疾病的诊断、咨询及转诊建议的有效性

Diagnostics (Basel). 2025 Jul 21;15(14):1834. doi: 10.3390/diagnostics15141834.

Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.评估ChatGPT、Gemini和Perplexity针对强直性脊柱炎最常见问题生成的回答的可读性、质量和可靠性。

PLoS One. 2025 Jun 18;20(6):e0326351. doi: 10.1371/journal.pone.0326351. eCollection 2025.

Evaluation of ChatGPT Responses About Sexual Activity After Total Hip Arthroplasty: A Comparative Study with Observers of Different Experience Levels.评估ChatGPT对全髋关节置换术后性活动的回答：与不同经验水平观察者的对比研究。

J Clin Med. 2025 Apr 24;14(9):2942. doi: 10.3390/jcm14092942.

Medicine (Baltimore). 2025 Mar 14;104(11):e41780. doi: 10.1097/MD.0000000000041780.

PeerJ. 2025 Jan 22;13:e18847. doi: 10.7717/peerj.18847. eCollection 2025.

Large language models in patient education: a scoping review of applications in medicine.用于患者教育的大语言模型：医学应用的范围综述

Front Med (Lausanne). 2024 Oct 29;11:1477898. doi: 10.3389/fmed.2024.1477898. eCollection 2024.

本文引用的文献

Evaluating the Use of ChatGPT to Accurately Simplify Patient-centered Information about Breast Cancer Prevention and Screening.评估使用 ChatGPT 准确简化有关乳腺癌预防和筛查的以患者为中心的信息。

Radiol Imaging Cancer. 2024 Mar;6(2):e230086. doi: 10.1148/rycan.230086.

Prognostic Factors of Mortality and Functional Outcome for Acute Subdural Hematoma: A Review Article.急性硬膜下血肿死亡率和功能预后的预后因素：一篇综述文章。

Asian J Neurosurg. 2023 Aug 31;18(3):454-467. doi: 10.1055/s-0043-1772763. eCollection 2023 Sep.

Assessment of the Readability of the Online Patient Education Materials of Intensive and Critical Care Societies.评估重症监护学会在线患者教育材料的可读性。

Crit Care Med. 2024 Feb 1;52(2):e47-e57. doi: 10.1097/CCM.0000000000006121. Epub 2023 Nov 13.

How readable and quality are online patient education materials about Helicobacter pylori?: Assessment of the readability, quality and reliability.关于幽门螺杆菌的在线患者教育材料的可读性和质量如何？：评估可读性、质量和可靠性。

Medicine (Baltimore). 2023 Oct 27;102(43):e35543. doi: 10.1097/MD.0000000000035543.

ChatGPT and most frequent urological diseases: analysing the quality of information and potential risks for patients.ChatGPT 和最常见的泌尿科疾病：分析信息质量和患者潜在风险。

World J Urol. 2023 Nov;41(11):3149-3153. doi: 10.1007/s00345-023-04563-0. Epub 2023 Aug 26.

BPPV Information on Google Versus AI (ChatGPT).谷歌与人工智能（ChatGPT）上的良性阵发性位置性眩晕信息

Otolaryngol Head Neck Surg. 2024 Jun;170(6):1504-1511. doi: 10.1002/ohn.506. Epub 2023 Aug 25.

How Well Do Artificial Intelligence Chatbots Respond to the Top Search Queries About Urological Malignancies?人工智能聊天机器人对泌尿系统恶性肿瘤热门搜索查询的响应如何？

Eur Urol. 2024 Jan;85(1):13-16. doi: 10.1016/j.eururo.2023.07.004. Epub 2023 Aug 10.

Evaluation High-Quality of Information from ChatGPT (Artificial Intelligence-Large Language Model) Artificial Intelligence on Shoulder Stabilization Surgery.评估 ChatGPT（人工智能-大型语言模型）在肩稳定手术方面的信息质量。

Arthroscopy. 2024 Mar;40(3):726-731.e6. doi: 10.1016/j.arthro.2023.07.048. Epub 2023 Aug 9.

ChatGPT's Ability to Assess Quality and Readability of Online Medical Information: Evidence From a Cross-Sectional Study.ChatGPT评估在线医学信息质量和可读性的能力：一项横断面研究的证据。

Cureus. 2023 Jul 20;15(7):e42214. doi: 10.7759/cureus.42214. eCollection 2023 Jul.

Evaluating the Effectiveness of Artificial Intelligence-powered Large Language Models Application in Disseminating Appropriate and Readable Health Information in Urology.评估人工智能驱动的大型语言模型在泌尿外科传播恰当且易读的健康信息方面的有效性。

J Urol. 2023 Oct;210(4):688-694. doi: 10.1097/JU.0000000000003615. Epub 2023 Jul 10.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。