Gül Şanser, Erdemir İsmail, Hanci Volkan, Aydoğmuş Evren, Erkoç Yavuz Selim
Department of Neurosurgery, Ankara Ataturk Sanatory Education and Research Hospital, Ankara, Turkey.
Department of Anesthesiology and Critical Care, Faculty of Medicine, Dokuz Eylül University, Izmir, Turkey.
Medicine (Baltimore). 2024 May 3;103(18):e38009. doi: 10.1097/MD.0000000000038009.
Subdural hematoma is defined as blood collection in the subdural space between the dura mater and arachnoid. Subdural hematoma is a condition that neurosurgeons frequently encounter and has acute, subacute and chronic forms. The incidence in adults is reported to be 1.72-20.60/100.000 people annually. Our study aimed to evaluate the quality, reliability and readability of the answers to questions asked to ChatGPT, Bard, and perplexity about "Subdural Hematoma." In this observational and cross-sectional study, we asked ChatGPT, Bard, and perplexity to provide the 100 most frequently asked questions about "Subdural Hematoma" separately. Responses from both chatbots were analyzed separately for readability, quality, reliability and adequacy. When the median readability scores of ChatGPT, Bard, and perplexity answers were compared with the sixth-grade reading level, a statistically significant difference was observed in all formulas (P < .001). All 3 chatbot responses were found to be difficult to read. Bard responses were more readable than ChatGPT's (P < .001) and perplexity's (P < .001) responses for all scores evaluated. Although there were differences between the results of the evaluated calculators, perplexity's answers were determined to be more readable than ChatGPT's answers (P < .05). Bard answers were determined to have the best GQS scores (P < .001). Perplexity responses had the best Journal of American Medical Association and modified DISCERN scores (P < .001). ChatGPT, Bard, and perplexity's current capabilities are inadequate in terms of quality and readability of "Subdural Hematoma" related text content. The readability standard for patient education materials as determined by the American Medical Association, National Institutes of Health, and the United States Department of Health and Human Services is at or below grade 6. The readability levels of the responses of artificial intelligence applications such as ChatGPT, Bard, and perplexity are significantly higher than the recommended 6th grade level.
硬膜下血肿被定义为硬脑膜和蛛网膜之间硬膜下间隙的血液聚集。硬膜下血肿是神经外科医生经常遇到的一种病症,有急性、亚急性和慢性三种形式。据报道,成年人每年的发病率为1.72 - 20.60/100,000人。我们的研究旨在评估ChatGPT、Bard和Perplexity针对“硬膜下血肿”问题给出答案的质量、可靠性和可读性。在这项观察性横断面研究中,我们分别要求ChatGPT、Bard和Perplexity提供关于“硬膜下血肿”的100个最常见问题。对两个聊天机器人的回复分别进行可读性、质量、可靠性和充分性分析。将ChatGPT、Bard和Perplexity答案的中位数可读性得分与六年级阅读水平进行比较时,在所有公式中均观察到统计学上的显著差异(P < 0.001)。发现所有3个聊天机器人的回复都难以阅读。对于所有评估分数,Bard的回复比ChatGPT的(P < 0.001)和Perplexity的(P < 0.001)回复更具可读性。尽管评估计算器的结果存在差异,但Perplexity的答案被确定比ChatGPT的答案更具可读性(P < 0.05)。Bard的答案被确定具有最佳的GQS分数(P < 0.001)。Perplexity的回复具有最佳的《美国医学会杂志》和改良的DISCERN分数(P < 0.001)。ChatGPT、Bard和Perplexity目前在与“硬膜下血肿”相关文本内容的质量和可读性方面能力不足。美国医学会、国立卫生研究院和美国卫生与公众服务部确定的患者教育材料的可读性标准为六年级及以下水平。ChatGPT、Bard和Perplexity等人工智能应用的回复可读性水平明显高于推荐的六年级水平。