• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估人工智能聊天机器人在阴茎增大信息方面的表现:可读性、可靠性和质量的比较分析。

Evaluating AI chatbots in penis enhancement information: a comparative analysis of readability, reliability and quality.

作者信息

Kayra Mehmet Vehbi, Anil Hakan, Ozdogan Ilturk, Baradia Suhail Mohamed Amin, Toksoz Serdar

机构信息

Department of Urology, Baskent University Adana Dr. Turgut Noyan Application and Research Center, Adana, Turkey.

Department of Urology, Adana City Hospital, Adana, Turkey.

出版信息

Int J Impot Res. 2025 Jun 3. doi: 10.1038/s41443-025-01098-3.

DOI:10.1038/s41443-025-01098-3
PMID:40461830
Abstract

This study aims to evaluate and compare the performance of artificial intelligence chatbots by assessing the reliability and quality of the information they provide regarding penis enhancement (PE). Search trends for keywords related to PE were determined using Google Trends ( https://trends.google.com ) and Semrush ( https://www.semrush.com ). Data covering a ten-year period was analyzed, taking into account regional trends and changes in search volume. Based on these trends, 25 questions were selected and categorized into three groups: general information (GI), surgical treatment (ST) and myths/misconceptions (MM). These questions were posed to three advanced chatbots: ChatGPT-4, Gemini Pro and Llama 3.1. Responses from each model were analyzed for readability using the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease Score (FRES), while the quality of the responses was evaluated using the Ensuring Quality Information for Patients (EQIP) tool and the Modified DISCERN Score. All chatbot responses exhibited difficulty in readability and understanding according to FKGL and FRES, with no statistically significant differences among them (FKGL: p = 0.167; FRES: p = 0.366). Llama achieved the highest median Modified DISCERN score (4 [IQR:1]), significantly outperforming ChatGPT (3 [IQR:0]) and Gemini (3 [IQR:2]) (p < 0.001). Pairwise comparisons showed no significant difference between ChatGPT and Gemini (p = 0.070), but Llama was superior to both (p < 0.001). In EQIP scores, Llama also scored highest (73.8 ± 2.2), significantly surpassing ChatGPT (68.7 ± 2.1) and Gemini (54.2 ± 1.3) (p < 0.001). Across categories, Llama consistently achieved higher EQIP scores (GI:71.1 ± 1.6; ST: 73.6 ± 4.1; MM: 76.3 ± 2.1) and Modified DISCERN scores (GI:4 [IQR:0]; ST:4 [IQR:1]; MM:3 [IQR:1]) compared to ChatGPT (EQIP: GI:68.4 ± 1.1; ST: 65.7 ± 2.2; MM:71.1 ± 1.7; Modified DISCERN: GI:3 [IQR:1]; ST:3 [IQR:1]; MM:3 [IQR:0]) and Gemini (EQIP: GI:55.2 ± 1.4; ST:55.2 ± 1.6; MM:2.6 ± 2.5; Modified DISCERN: GI:1 [IQR:2]; ST:1 [IQR:2]; MM:3 [IQR:0]) (p < 0.001). This study highlights Llama's superior reliability in providing PE-related health information, though all chatbots struggled with readability.

摘要

本研究旨在通过评估人工智能聊天机器人提供的有关阴茎增大(PE)信息的可靠性和质量,来评估和比较它们的性能。使用谷歌趋势(https://trends.google.com)和Semrush(https://www.semrush.com)确定与PE相关的关键词搜索趋势。分析了涵盖十年的数据,同时考虑了区域趋势和搜索量的变化。基于这些趋势,选择了25个问题并分为三组:一般信息(GI)、手术治疗(ST)以及误区/误解(MM)。将这些问题抛给三个先进的聊天机器人:ChatGPT-4、Gemini Pro和Llama 3.1。使用弗莱什-金凯德年级水平(FKGL)和弗莱什阅读易读性得分(FRES)分析每个模型回复的可读性,同时使用患者质量信息保障(EQIP)工具和改良的DISCERN评分评估回复的质量。根据FKGL和FRES,所有聊天机器人的回复在可读性和理解方面都存在困难,它们之间没有统计学上的显著差异(FKGL:p = 0.167;FRES:p = 0.366)。Llama的改良DISCERN评分中位数最高(4[四分位距:1]),显著优于ChatGPT(3[四分位距:0])和Gemini(3[四分位距:2])(p < 0.001)。两两比较显示ChatGPT和Gemini之间没有显著差异(p = 0.070),但Llama优于两者(p < 0.001)。在EQIP评分中,Llama得分也最高(73.8±2.2),显著超过ChatGPT(68.7±2.1)和Gemini(54.2±1.3)(p < 0.001)。在各个类别中,与ChatGPT(EQIP:GI:68.4±1.1;ST:65.7±2.2;MM:71.1±1.7;改良DISCERN:GI:3[四分位距:1];ST:3[四分位距:1];MM:3[四分位距:0])和Gemini(EQIP:GI:55.2±1.4;ST:55.2±1.6;MM:2.6±2.5;改良DISCERN:GI:1[四分位距:2];ST:1[四分位距:2];MM:3[四分位距:0])相比,Llama始终获得更高的EQIP评分(GI:71.1±1.6;ST:73.6±4.1;MM:76.3±2.1)和改良DISCERN评分(GI:4[四分位距:0];ST:4[四分位距:1];MM:3[四分位距:1])(p < 0.001)。本研究强调了Llama在提供与PE相关的健康信息方面具有更高的可靠性,尽管所有聊天机器人在可读性方面都存在困难。

相似文献

1
Evaluating AI chatbots in penis enhancement information: a comparative analysis of readability, reliability and quality.评估人工智能聊天机器人在阴茎增大信息方面的表现:可读性、可靠性和质量的比较分析。
Int J Impot Res. 2025 Jun 3. doi: 10.1038/s41443-025-01098-3.
2
Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.评估ChatGPT、Gemini和Perplexity针对强直性脊柱炎最常见问题生成的回答的可读性、质量和可靠性。
PLoS One. 2025 Jun 18;20(6):e0326351. doi: 10.1371/journal.pone.0326351. eCollection 2025.
3
The Reliability Gap: How Traditional Search Engines Outperform Artificial Intelligence (AI) Chatbots in Rosacea Public Health Information Quality.可靠性差距:传统搜索引擎在酒渣鼻公共卫生信息质量方面如何优于人工智能(AI)聊天机器人。
Cureus. 2025 Jun 22;17(6):e86543. doi: 10.7759/cureus.86543. eCollection 2025 Jun.
4
Evaluation of Information Provided by ChatGPT Versions on Traumatic Dental Injuries for Dental Students and Professionals.评估ChatGPT不同版本为牙科学生和专业人员提供的有关创伤性牙损伤的信息。
Dent Traumatol. 2025 Aug;41(4):427-436. doi: 10.1111/edt.13042. Epub 2025 Jan 23.
5
Parental education in pediatric dysphagia: A comparative analysis of three large language models.儿科吞咽困难中的家长教育:三种大型语言模型的比较分析
J Pediatr Gastroenterol Nutr. 2025 Jul;81(1):18-26. doi: 10.1002/jpn3.70069. Epub 2025 May 8.
6
Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.外周动脉疾病教育中的人工智能:ChatGPT与谷歌Gemini的较量
Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.
7
Quality of Information on Wilms Tumor From Artificial Intelligence Chatbots: What Are Your Patients and Their Families Reading?人工智能聊天机器人提供的肾母细胞瘤信息质量:你的患者及其家属在阅读什么?
Urology. 2025 Apr;198:130-134. doi: 10.1016/j.urology.2025.01.054. Epub 2025 Feb 4.
8
Assessing the Quality and Readability of Online Patient Information: ENT UK Patient Information e-Leaflets versus Responses by a Generative Artificial Intelligence.评估在线患者信息的质量和可读性:英国耳鼻喉科患者信息电子传单与生成式人工智能的回应
Facial Plast Surg. 2024 Oct 15. doi: 10.1055/a-2413-3675.
9
Evaluation of ChatGPT-4 as an Online Outpatient Assistant in Puerperal Mastitis Management: Content Analysis of an Observational Study.评估ChatGPT-4作为产褥期乳腺炎管理在线门诊助手的效果:一项观察性研究的内容分析
JMIR Med Inform. 2025 Jul 24;13:e68980. doi: 10.2196/68980.
10
Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3.葡萄膜炎中大型语言模型性能的基准测试:ChatGPT-3.5、ChatGPT-4.0、谷歌Gemini和Anthropic Claude3的比较分析
Eye (Lond). 2025 Apr;39(6):1132-1137. doi: 10.1038/s41433-024-03545-9. Epub 2024 Dec 17.

本文引用的文献

1
ChatGPT as a Support Tool for Informed Consent and Preoperative Patient Education Prior to Penile Prosthesis Implantation.ChatGPT作为阴茎假体植入术前知情同意和患者术前教育的辅助工具。
J Clin Med. 2024 Dec 10;13(24):7482. doi: 10.3390/jcm13247482.
2
Quality of Chatbot Information Related to Benign Prostatic Hyperplasia.与良性前列腺增生相关的聊天机器人信息质量
Prostate. 2025 Feb;85(2):175-180. doi: 10.1002/pros.24814. Epub 2024 Nov 8.
3
Evaluating the efficacy of artificial intelligence chatbots in urological health: insights for urologists on patient interactions with large language models.
评估人工智能聊天机器人在泌尿健康方面的功效:为泌尿科医生提供有关患者与大语言模型互动的见解。
Transl Androl Urol. 2024 May 31;13(5):879-883. doi: 10.21037/tau-23-635. Epub 2024 May 7.
4
Evaluation of information accuracy and clarity: ChatGPT responses to the most frequently asked questions about premature ejaculation.信息准确性与清晰度评估:ChatGPT对早泄常见问题的回答
Sex Med. 2024 Jun 2;12(3):qfae036. doi: 10.1093/sexmed/qfae036. eCollection 2024 Jun.
5
Chatbots vs andrologists: Testing 25 clinical cases.聊天机器人与男科学专家对比:对25个临床病例进行测试
Fr J Urol. 2024 Jun;34(5):102636. doi: 10.1016/j.fjurol.2024.102636. Epub 2024 Apr 8.
6
Responses of Five Different Artificial Intelligence Chatbots to the Top Searched Queries About Erectile Dysfunction: A Comparative Analysis.五种不同人工智能聊天机器人对阳痿热搜查询的反应:比较分析。
J Med Syst. 2024 Apr 3;48(1):38. doi: 10.1007/s10916-024-02056-0.
7
Accuracy and Reliability of Chatbot Responses to Physician Questions.聊天机器人对医生提问回答的准确性和可靠性。
JAMA Netw Open. 2023 Oct 2;6(10):e2336483. doi: 10.1001/jamanetworkopen.2023.36483.
8
How Well Do Artificial Intelligence Chatbots Respond to the Top Search Queries About Urological Malignancies?人工智能聊天机器人对泌尿系统恶性肿瘤热门搜索查询的响应如何?
Eur Urol. 2024 Jan;85(1):13-16. doi: 10.1016/j.eururo.2023.07.004. Epub 2023 Aug 10.
9
Caution! AI Bot Has Entered the Patient Chat: ChatGPT Has Limitations in Providing Accurate Urologic Healthcare Advice.注意!人工智能机器人已进入患者聊天界面:ChatGPT在提供准确的泌尿科医疗建议方面存在局限性。
Urology. 2023 Oct;180:278-284. doi: 10.1016/j.urology.2023.07.010. Epub 2023 Jul 17.
10
Evaluating the Effectiveness of Artificial Intelligence-powered Large Language Models Application in Disseminating Appropriate and Readable Health Information in Urology.评估人工智能驱动的大型语言模型在泌尿外科传播恰当且易读的健康信息方面的有效性。
J Urol. 2023 Oct;210(4):688-694. doi: 10.1097/JU.0000000000003615. Epub 2023 Jul 10.