文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

评估心血管健康、肿瘤学和银屑病领域的聊天机器人的响应质量和可读性:一项比较研究。

Assessing the response quality and readability of chatbots in cardiovascular health, oncology, and psoriasis: A comparative study.

机构信息

Gerontology, Public Health and Education Department, National Institute of Geriatrics, Rheumatology and Rehabilitation, Warsaw, Poland; Department of Ultrasound, Institute of Fundamental Technological Research, Polish Academy of Sciences.

Gerontology, Public Health and Education Department, National Institute of Geriatrics, Rheumatology and Rehabilitation, Warsaw, Poland.

出版信息

Int J Med Inform. 2024 Oct;190:105562. doi: 10.1016/j.ijmedinf.2024.105562. Epub 2024 Jul 19.


DOI:10.1016/j.ijmedinf.2024.105562
PMID:39059084
Abstract

BACKGROUND: Chatbots using the Large Language Model (LLM) generate human responses to questions from all categories. Due to staff shortages in healthcare systems, patients waiting for an appointment increasingly use chatbots to get information about their condition. Given the number of chatbots currently available, assessing the responses they generate is essential. METHODS: Five chatbots with free access were selected (Gemini, Microsoft Copilot, PiAI, ChatGPT, ChatSpot) and blinded using letters (A, B, C, D, E). Each chatbot was asked questions about cardiology, oncology, and psoriasis. Responses were compared to guidelines from the European Society of Cardiology, American Academy of Dermatology and American Society of Clinical Oncology. All answers were assessed using readability scales (Flesch Reading Scale, Gunning Fog Scale Level, Flesch-Kincaid Grade Level and Dale-Chall Score). Using a 3-point Likert scale, two independent medical professionals assessed the compliance of the responses with the guidelines. RESULTS: A total of 45 questions were asked of all chatbots. Chatbot C gave the shortest answers, 7.0 (6.0 - 8.0), and Chatbot A the longest 17.5 (13.0 - 24.5). The Flesch Reading Ease Scale ranged from 16.3 (12.2 - 21.9) (Chatbot D) to 39.8 (29.0 - 50.4) (Chatbot A). Flesch-Kincaid Grade Level ranged from 12.5 (10.6 - 14.6) (Chatbot A) to 15.9 (15.1 - 17.1) (Chatbot D). Gunning Fog Scale Level ranged from 15.77 (Chatbot A) to 19.73 (Chatbot D). Dale-Chall Score ranged from 10.3 (9.3 - 11.3) (Chatbot A) to 11.9 (11.5 - 12.4) (Chatbot D). CONCLUSION: This study indicates that chatbots vary in length, quality, and readability. They answer each question in their own way, based on the data they have pulled from the web. Reliability of the responses generated by chatbots is high. This suggests that people who want information from a chatbot need to be careful and verify the answers they receive, particularly when they ask about medical and health aspects.

摘要

背景:使用大型语言模型(LLM)的聊天机器人可以针对来自各个类别的问题生成人类回复。由于医疗系统人手短缺,越来越多的患者在预约时开始使用聊天机器人来获取有关其病情的信息。鉴于目前可用的聊天机器人数量众多,评估它们生成的回复至关重要。

方法:选择了五个免费访问的聊天机器人(Gemini、Microsoft Copilot、PiAI、ChatGPT 和 ChatSpot),并使用字母(A、B、C、D、E)进行盲法处理。每个聊天机器人都被问到了心脏病学、肿瘤学和银屑病方面的问题。将回复与欧洲心脏病学会、美国皮肤病学会和美国临床肿瘤学会的指南进行比较。使用可读性量表(Flesch 阅读量表、Gunning Fog 量表等级、Flesch-Kincaid 等级和 Dale-Chall 分数)对所有答案进行评估。两位独立的医学专业人员使用 3 分李克特量表评估回复与指南的一致性。

结果:总共向所有聊天机器人提出了 45 个问题。聊天机器人 C 给出的回复最短,为 7.0(6.0-8.0),而聊天机器人 A 给出的回复最长,为 17.5(13.0-24.5)。Flesch 阅读易读性量表范围从 16.3(12.2-21.9)(聊天机器人 D)到 39.8(29.0-50.4)(聊天机器人 A)。Flesch-Kincaid 等级范围从 12.5(10.6-14.6)(聊天机器人 A)到 15.9(15.1-17.1)(聊天机器人 D)。Gunning Fog 量表等级范围从 15.77(聊天机器人 A)到 19.73(聊天机器人 D)。Dale-Chall 分数范围从 10.3(9.3-11.3)(聊天机器人 A)到 11.9(11.5-12.4)(聊天机器人 D)。

结论:本研究表明,聊天机器人在长度、质量和可读性方面存在差异。它们根据从网络上提取的数据以自己的方式回答每个问题。聊天机器人生成的回复可靠性很高。这表明,希望从聊天机器人获取信息的人需要小心并验证他们收到的答案,特别是当他们询问医疗和健康方面的问题时。

相似文献

[1]
Assessing the response quality and readability of chatbots in cardiovascular health, oncology, and psoriasis: A comparative study.

Int J Med Inform. 2024-10

[2]
Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.

Cureus. 2024-8-28

[3]
Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study.

Medicine (Baltimore). 2024-5-31

[4]
Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.

Cureus. 2024-7-4

[5]
Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.

Medicine (Baltimore). 2024-8-16

[6]
Assessment of online patient education materials from major ophthalmologic associations.

JAMA Ophthalmol. 2015-4

[7]
Readability and Information Quality in Cancer Information From a Free vs Paid Chatbot.

JAMA Netw Open. 2024-7-1

[8]
Performance of Artificial Intelligence Chatbots on Glaucoma Questions Adapted From Patient Brochures.

Cureus. 2024-3-23

[9]
Can artificial intelligence models serve as patient information consultants in orthodontics?

BMC Med Inform Decis Mak. 2024-7-29

[10]
Readability analysis of ChatGPT's responses on lung cancer.

Sci Rep. 2024-7-26

引用本文的文献

[1]
Development and Evaluation of a Retrieval-Augmented Generation-Based Electronic Medical Record Chatbot System.

Healthc Inform Res. 2025-7

[2]
Does ChatGPT help patients access reliable and comprehensive information about psoriasis?

Proc (Bayl Univ Med Cent). 2025-6-20

[3]
Evaluating the Quality of Cardiovascular Disease Information From AI Chatbots: A Comparative Study.

Cureus. 2025-7-16

[4]
Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.

PLoS One. 2025-6-18

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索