• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估ChatGPT、Gemini和Perplexity针对有关腰痛的最常见关键词所给出回答的可读性、质量和可靠性。

Assessing the readability, quality and reliability of responses produced by ChatGPT, Gemini, and Perplexity regarding most frequently asked keywords about low back pain.

作者信息

Ozduran Erkan, Hancı Volkan, Erkin Yüksel, Özbek İlhan Celil, Abdulkerimov Vugar

机构信息

Physical Medicine and Rehabilitation, Pain Medicine, Sivas Numune Hospital, Sivas, Turkey.

Anesthesiology and Reanimation, Critical Care Medicine, Dokuz Eylül University, Izmir, Turkey.

出版信息

PeerJ. 2025 Jan 22;13:e18847. doi: 10.7717/peerj.18847. eCollection 2025.

DOI:10.7717/peerj.18847
PMID:39866564
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11760201/
Abstract

BACKGROUND

Patients who are informed about the causes, pathophysiology, treatment and prevention of a disease are better able to participate in treatment procedures in the event of illness. Artificial intelligence (AI), which has gained popularity in recent years, is defined as the study of algorithms that provide machines with the ability to reason and perform cognitive functions, including object and word recognition, problem solving and decision making. This study aimed to examine the readability, reliability and quality of responses to frequently asked keywords about low back pain (LBP) given by three different AI-based chatbots (ChatGPT, Perplexity and Gemini), which are popular applications in online information presentation today.

METHODS

All three AI chatbots were asked the 25 most frequently used keywords related to LBP determined with the help of Google Trend. In order to prevent possible bias that could be created by the sequential processing of keywords in the answers given by the chatbots, the study was designed by providing input from different users (EO, VH) for each keyword. The readability of the responses given was determined with the Simple Measure of Gobbledygook (SMOG), Flesch Reading Ease Score (FRES) and Gunning Fog (GFG) readability scores. Quality was assessed using the Global Quality Score (GQS) and the Ensuring Quality Information for Patients (EQIP) score. Reliability was assessed by determining with DISCERN and Journal of American Medical Association (JAMA) scales.

RESULTS

The first three keywords detected as a result of Google Trend search were "Lower Back Pain", "ICD 10 Low Back Pain", and "Low Back Pain Symptoms". It was determined that the readability of the responses given by all AI chatbots was higher than the recommended 6th grade readability level ( < 0.001). In the EQIP, JAMA, modified DISCERN and GQS score evaluation, Perplexity was found to have significantly higher scores than other chatbots ( < 0.001).

CONCLUSION

It has been determined that the answers given by AI chatbots to keywords about LBP are difficult to read and have low reliability and quality assessment. It is clear that when new chatbots are introduced, they can provide better guidance to patients with increased clarity and text quality. This study can provide inspiration for future studies on improving the algorithms and responses of AI chatbots.

摘要

背景

了解疾病病因、病理生理学、治疗和预防的患者在患病时更能参与治疗过程。近年来广受欢迎的人工智能(AI)被定义为对算法的研究,这些算法使机器具备推理和执行认知功能的能力,包括对象和单词识别、问题解决和决策。本研究旨在检验三种不同的基于人工智能的聊天机器人(ChatGPT、Perplexity和Gemini)对关于腰痛(LBP)的常见关键词的回答的可读性、可靠性和质量,这三种聊天机器人是当今在线信息展示中的热门应用。

方法

借助谷歌趋势确定了与腰痛相关的25个最常用关键词,并向所有三个人工智能聊天机器人提问。为了防止聊天机器人给出的答案中关键词顺序处理可能产生的偏差,该研究通过为每个关键词提供来自不同用户(EO、VH)的输入来设计。给出的回答的可读性通过难词简易测量法(SMOG)、弗莱什易读性分数(FRES)和冈宁雾度(GFG)可读性分数来确定。质量使用全球质量评分(GQS)和患者质量信息保障(EQIP)评分进行评估。可靠性通过DISCERN和美国医学会杂志(JAMA)量表来确定。

结果

谷歌趋势搜索结果中检测到的前三个关键词是“腰痛”、“国际疾病分类第10版腰痛”和“腰痛症状”。确定所有人工智能聊天机器人给出的回答的可读性均高于推荐的六年级可读性水平(<0.001)。在EQIP、JAMA、改良的DISCERN和GQS评分评估中,发现Perplexity的得分显著高于其他聊天机器人(<0.001)。

结论

已确定人工智能聊天机器人对关于腰痛的关键词的回答难以阅读,可靠性和质量评估较低。很明显,当引入新的聊天机器人时,它们可以以更高的清晰度和文本质量为患者提供更好的指导。本研究可为未来改进人工智能聊天机器人算法和回答的研究提供灵感。