Suppr超能文献

勃起功能障碍常见问题:在专家指导下评估人工智能给出的答案

Frequently asked questions on erectile dysfunction: evaluating artificial intelligence answers with expert mentorship.

作者信息

Baturu Muharrem, Solakhan Mehmet, Kazaz Tanyeli Guneyligil, Bayrak Omer

机构信息

Department of Urology, University of Gaziantep, Gaziantep, Turkey.

Department of Urology, Hasan Kalyoncu University, Gaziantep, Turkey.

出版信息

Int J Impot Res. 2025 Apr;37(4):310-314. doi: 10.1038/s41443-024-00898-3. Epub 2024 May 7.

Abstract

The present study assessed the accuracy of artificiaI intelligence-generated responses to frequently asked questions on erectile dysfunction. A cross-sectional analysis involved 56 erectile dysfunction-related questions searched on Google, categorized into nine sections: causes, diagnosis, treatment options, treatment complications, protective measures, relationship with other illnesses, treatment costs, treatment with herbal agents, and appointments. Responses from ChatGPT 3.5, ChatGPT 4, and BARD were evaluated by two experienced urology experts using the F1 and global quality scores (GQS) for accuracy, relevance, and comprehensibility. ChatGPT 3.5 and ChatGPT 4 achieved higher GQS than BARD in categories such as causes (4.5 ± 0.54, 4.5 ± 0.51, 3.15 ± 1.01, respectively, p < 0.001), treatment options (4.35 ± 0.6, 4.5 ± 0.43, 2.71 ± 1.38, respectively, p < 0.001), protective measures (5.0 ± 0, 5.0 ± 0, 4 ± 0.5, respectively, p = 0.013), relationships with other illnesses (4.58 ± 0.58, 4.83 ± 0.25, 3.58 ± 0.8, respectively, p = 0.006), and treatment with herbal agents (3 ± 0.61, 3.33 ± 0.83, 1.8 ± 1.09, respectively, p = 0.043). F1 scores in categories: causes (1), diagnosis (0.857), treatment options (0.726), and protective measures (1), indicated their alignment with the guidelines. There was no significant difference between ChatGPT 3.5 and ChatGPT 4 regarding answer quality, but both outperformed BARD in the GQS. These results emphasize the need to continually enhance and validate AI-generated medical information, underscoring the importance of artificiaI intelligence systems in delivering reliable information on erectile dysfunction.

摘要

本研究评估了人工智能生成的关于勃起功能障碍常见问题回答的准确性。一项横断面分析涉及在谷歌上搜索的56个与勃起功能障碍相关的问题,分为九个部分:病因、诊断、治疗选择、治疗并发症、保护措施、与其他疾病的关系、治疗费用、草药治疗以及预约。两位经验丰富的泌尿外科专家使用F1和全球质量评分(GQS)对ChatGPT 3.5、ChatGPT 4和BARD的回答进行准确性、相关性和可理解性评估。在病因(分别为4.5±0.54、4.5±0.51、3.15±1.01,p<0.001)、治疗选择(分别为4.35±0.6、4.5±0.43、2.71±1.38,p<0.001)、保护措施(分别为5.0±0、5.0±0、4±0.5,p = 0.013)、与其他疾病的关系(分别为4.58±0.58、4.83±0.25、3.58±0.8,p = 0.006)以及草药治疗(分别为3±0.61、3.33±0.83、1.8±1.09,p = 0.043)等类别中,ChatGPT 3.5和ChatGPT 4的GQS得分高于BARD。各分类的F1得分:病因(1)、诊断(0.857)、治疗选择(0.726)和保护措施(1),表明它们与指南相符。ChatGPT 3.5和ChatGPT 4在回答质量方面没有显著差异,但在GQS上均优于BARD。这些结果强调了持续增强和验证人工智能生成的医学信息的必要性,凸显了人工智能系统在提供关于勃起功能障碍可靠信息方面的重要性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验