Suppr超能文献

比较ChatGPT和谷歌Gemini对常见肛门良性疾病问题的回答。

Comparing answers of ChatGPT and Google Gemini to common questions on benign anal conditions.

作者信息

Maron C M, Emile S H, Horesh N, Freund M R, Pellino G, Wexner S D

机构信息

Trinity College, Hartford, CT, USA.

Ellen Leifer Shulman and Steven Shulman Digestive Disease Center, Cleveland Clinic Florida, 2950 Cleveland Clinic Blvd, Weston, FL, USA.

出版信息

Tech Coloproctol. 2025 Jan 26;29(1):57. doi: 10.1007/s10151-024-03096-x.

Abstract

INTRODUCTION

Chatbots have been increasingly used as a source of patient education. This study aimed to compare the answers of ChatGPT-4 and Google Gemini to common questions on benign anal conditions in terms of appropriateness, comprehensiveness, and language level.

METHODS

Each chatbot was asked a set of 30 questions on hemorrhoidal disease, anal fissures, and anal fistulas. The responses were assessed for appropriateness, comprehensiveness, and reference provision. The assessments were made by three subject experts who were unaware of the name of the chatbots. The language level of the chatbot answers was assessed using the Flesch-Kincaid Reading Ease score and grade level.

RESULTS

Overall, the answers provided by both models were appropriate and comprehensive. The answers of Google Gemini were more appropriate, comprehensive, and supported by references compared with the answers of ChatGPT. In addition, the agreement among the assessors on the appropriateness of Google Gemini answers was higher, attesting to a higher consistency. ChatGPT had a significantly higher Flesh-Kincaid grade level than Google Gemini (12.3 versus 10.6, p = 0.015), but a similar median Flesh-Kincaid Ease score.

CONCLUSIONS

The answers of Google Gemini to questions on common benign anal conditions were more appropriate and comprehensive, and more often supported with references, than the answers of ChatGPT. The answers of both chatbots were at grade levels higher than the 6th grade level, which may be difficult for nonmedical individuals to comprehend.

摘要

引言

聊天机器人越来越多地被用作患者教育的来源。本研究旨在比较ChatGPT-4和谷歌Gemini针对常见肛门良性疾病问题给出的答案在恰当性、全面性和语言水平方面的差异。

方法

向每个聊天机器人提出一组关于痔病、肛裂和肛瘘的30个问题。对回答进行恰当性、全面性和参考文献提供情况的评估。评估由三位不了解聊天机器人名称的学科专家进行。使用弗莱什-金凯德阅读简易度得分和年级水平来评估聊天机器人答案的语言水平。

结果

总体而言,两个模型给出的答案都是恰当且全面的。与ChatGPT的答案相比,谷歌Gemini的答案更恰当、更全面且有参考文献支持。此外,评估者对谷歌Gemini答案恰当性的一致性更高,证明其一致性更强。ChatGPT的弗莱什-金凯德年级水平显著高于谷歌Gemini(12.3对10.6,p = 0.015),但中位数弗莱什-金凯德简易度得分相似。

结论

对于常见肛门良性疾病问题,谷歌Gemini给出的答案比ChatGPT的答案更恰当、更全面,且更常伴有参考文献支持。两个聊天机器人的答案年级水平都高于六年级,这可能让非医学专业人士难以理解。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验