• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人工智能聊天机器人在回答医疗专业人员和护理人员有关德雷维特综合征问题时的性能比较评估。

Comparative assessment of artificial intelligence chatbots' performance in responding to healthcare professionals' and caregivers' questions about Dravet syndrome.

作者信息

Jesus-Ribeiro Joana, Roza Eugenia, Oliveiros Bárbara, Melo Joana Barbosa, Carreño Mar

机构信息

Coimbra Institute for Clinical and Biomedical Research (iCBR) - Center of Investigation on Environment Genetics and Oncobiology (CIMAGO), Faculty of Medicine, University of Coimbra, Coimbra, Portugal.

Neurology Department, Unidade Local de Saúde da Região de Leiria, Leiria, Portugal.

出版信息

Epilepsia Open. 2025 Apr 1. doi: 10.1002/epi4.70022.

DOI:10.1002/epi4.70022
PMID:40167029
Abstract

OBJECTIVE

Artificial intelligence chatbots have been a game changer in healthcare, providing immediate, round-the-clock assistance. However, their accuracy across specific medical domains remains under-evaluated. Dravet syndrome remains one of the most challenging epileptic encephalopathies, with new data continuously emerging in the literature. This study aims to evaluate and compare the performance of ChatGPT 3.5 and Perplexity in responding to questions about Dravet Syndrome.

METHODS

We curated 96 questions about Dravet syndrome, 43 from healthcare professionals and 53 from caregivers. Two epileptologists independently graded the chatbots' responses, with a third senior epileptologist resolving any disagreements to reach a final consensus. Accuracy and completeness of correct answers were rated on predefined 3-point scales. Incorrect responses were prompted for self-correction and re-evaluated. Readability was assessed using Flesch reading ease and Flesch-Kincaid grade level.

RESULTS

Both chatbots had the majority of their responses rated as "correct" (ChatGPT 3.5: 66.7%, Perplexity: 81.3%), with no significant difference in performance between the two (χ = 5.30, p = 0.071). ChatGPT 3.5 performed significantly better for caregivers than for healthcare professionals (χ = 7.27, p = 0.026). The topic with the poorest performance was Dravet syndrome's treatment, particularly for healthcare professional questions. Both models exhibited exemplary completeness, with most responses rated as "complete" to "comprehensive" (ChatGPT 3.5: 73.4%, Perplexity: 75.7%). Substantial self-correction capabilities were observed: ChatGPT 3.5 improved 55.6% of responses and Perplexity 80%. The texts were generally very difficult to read, requiring an advanced reading level. However, Perplexity's responses were significantly more readable than ChatGPT 3.5's [Flesch reading ease: 29.0 (SD 13.9) vs. 24.1 (SD 15.0), p = 0.018].

SIGNIFICANCE

Our findings underscore the potential of AI chatbots in delivering accurate and complete responses to Dravet syndrome queries. However, they have limitations, particularly in complex areas like treatment. Continuous efforts to update information and improve readability are essential.

PLAIN LANGUAGE SUMMARY

Artificial intelligence chatbots have the potential to improve access to medical information, including on conditions like Dravet syndrome, but the quality of this information is still unclear. In this study, ChatGPT 3.5 and Perplexity correctly answered most questions from healthcare professionals and caregivers, with ChatGPT 3.5 performing better for caregivers. Treatment-related questions had the most incorrect answers, particularly those from healthcare professionals. Both chatbots demonstrated the ability to correct previous incorrect responses, particularly Perplexity. Both chatbots produced text requiring advanced reading skills. Further improvements are needed to make the text easier to understand and address difficult medical topics.

摘要

目的

人工智能聊天机器人已成为医疗保健领域的变革者,可提供即时、全天候的帮助。然而,它们在特定医学领域的准确性仍未得到充分评估。德雷维特综合征仍然是最具挑战性的癫痫性脑病之一,文献中不断有新数据出现。本研究旨在评估和比较ChatGPT 3.5和Perplexity在回答有关德雷维特综合征问题时的表现。

方法

我们精心整理了96个有关德雷维特综合征的问题,其中43个来自医疗保健专业人员,53个来自护理人员。两名癫痫专家独立对聊天机器人的回答进行评分,第三名资深癫痫专家解决任何分歧以达成最终共识。正确答案的准确性和完整性根据预定义的3分制进行评分。错误回答会被要求自行纠正并重新评估。使用弗莱什易读性和弗莱什-金凯德年级水平来评估可读性。

结果

两个聊天机器人的大多数回答都被评为“正确”(ChatGPT 3.5:66.7%,Perplexity:81.3%),两者表现无显著差异(χ = 5.30,p = 0.071)。ChatGPT 3.5对护理人员的表现明显优于医疗保健专业人员(χ = 7.27,p = 0.026)。表现最差的主题是德雷维特综合征的治疗,尤其是针对医疗保健专业人员的问题。两个模型都表现出了出色的完整性,大多数回答被评为“完整”到“全面”(ChatGPT 3.5:73.4%,Perplexity:75.7%)。观察到显著的自我纠正能力:ChatGPT 3.5改进了55.6%的回答,Perplexity改进了80%。这些文本通常很难阅读,需要较高的阅读水平。然而,Perplexity的回答比ChatGPT 3.5的回答可读性明显更高[弗莱什易读性:29.0(标准差13.9)对24.1(标准差15.0),p = 0.018]。

意义

我们的研究结果强调了人工智能聊天机器人在提供有关德雷维特综合征问题的准确和完整回答方面的潜力。然而,它们也有局限性,特别是在治疗等复杂领域。持续努力更新信息和提高可读性至关重要。

通俗易懂的总结

人工智能聊天机器人有潜力改善获取医疗信息的途径,包括有关德雷维特综合征等病症的信息,但这些信息的质量仍不明确。在本研究中,ChatGPT 3.5和Perplexity正确回答了大多数来自医疗保健专业人员和护理人员的问题,ChatGPT 3.5对护理人员的表现更好。与治疗相关的问题错误答案最多,尤其是来自医疗保健专业人员的问题。两个聊天机器人都展示了纠正先前错误回答的能力,特别是Perplexity。两个聊天机器人生成的文本都需要较高的阅读技巧。需要进一步改进以使文本更易于理解并解决困难的医学主题。

相似文献

1
Comparative assessment of artificial intelligence chatbots' performance in responding to healthcare professionals' and caregivers' questions about Dravet syndrome.人工智能聊天机器人在回答医疗专业人员和护理人员有关德雷维特综合征问题时的性能比较评估。
Epilepsia Open. 2025 Apr 1. doi: 10.1002/epi4.70022.
2
Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.人工智能聊天机器人对输精管切除术相关问题回答的准确性和可读性:公众需谨慎。
Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.
3
Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment.评估 ChatGPT 在前列腺癌患者教育中的疗效:多指标评估。
J Med Internet Res. 2024 Aug 14;26:e55939. doi: 10.2196/55939.
4
Readability, reliability and quality of responses generated by ChatGPT, gemini, and perplexity for the most frequently asked questions about pain.ChatGPT、Gemini和Perplexity针对最常见疼痛问题生成的回答的可读性、可靠性和质量。
Medicine (Baltimore). 2025 Mar 14;104(11):e41780. doi: 10.1097/MD.0000000000041780.
5
Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.评估 ChatGPT®、BARD®、 Gemini®、Copilot®、Perplexity® 在姑息治疗方面的可读性、可靠性和质量。
Medicine (Baltimore). 2024 Aug 16;103(33):e39305. doi: 10.1097/MD.0000000000039305.
6
How artificial intelligence can provide information about subdural hematoma: Assessment of readability, reliability, and quality of ChatGPT, BARD, and perplexity responses.人工智能如何提供关于硬膜下血肿的信息:对ChatGPT、BARD和Perplexity回答的可读性、可靠性和质量评估。
Medicine (Baltimore). 2024 May 3;103(18):e38009. doi: 10.1097/MD.0000000000038009.
7
Accuracy of Prospective Assessments of 4 Large Language Model Chatbot Responses to Patient Questions About Emergency Care: Experimental Comparative Study.前瞻性评估 4 种大型语言模型聊天机器人对患者关于急救护理问题的回答的准确性:实验性对比研究。
J Med Internet Res. 2024 Nov 4;26:e60291. doi: 10.2196/60291.
8
Performance of Artificial Intelligence Chatbots in Responding to Patient Queries Related to Traumatic Dental Injuries: A Comparative Study.人工智能聊天机器人在回应与创伤性牙损伤相关的患者咨询中的表现:一项比较研究。
Dent Traumatol. 2025 Jun;41(3):338-347. doi: 10.1111/edt.13020. Epub 2024 Nov 22.
9
Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study.评估人工智能聊天机器人对心肺复苏术 100 个最常见查询的回答的易读性、可靠性和质量:一项观察性研究。
Medicine (Baltimore). 2024 May 31;103(22):e38352. doi: 10.1097/MD.0000000000038352.
10
Assessing the readability, quality and reliability of responses produced by ChatGPT, Gemini, and Perplexity regarding most frequently asked keywords about low back pain.评估ChatGPT、Gemini和Perplexity针对有关腰痛的最常见关键词所给出回答的可读性、质量和可靠性。
PeerJ. 2025 Jan 22;13:e18847. doi: 10.7717/peerj.18847. eCollection 2025.