Suppr超能文献

基于2023年指南建议对ChatGPT-4、ChatGPT-3.5和谷歌Gemini在多囊卵巢综合征评估与管理方面的比较评估

Comparative evaluation of ChatGPT-4, ChatGPT-3.5 and Google Gemini on PCOS assessment and management based on recommendations from the 2023 guideline.

作者信息

Gunesli Irmak, Aksun Seren, Fathelbab Jana, Yildiz Bulent Okan

机构信息

Hacettepe University School of Medicine, Department of Internal Medicine, Ankara, Turkey.

Hacettepe University School of Medicine, Division of Endocrinology and Metabolism, Ankara, Turkey.

出版信息

Endocrine. 2025 Apr;88(1):315-322. doi: 10.1007/s12020-024-04121-7. Epub 2024 Dec 2.

Abstract

CONTEXT

Artificial intelligence (AI) is increasingly utilized in healthcare, with models like ChatGPT and Google Gemini gaining global popularity. Polycystic ovary syndrome (PCOS) is a prevalent condition that requires both lifestyle modifications and medical treatment, highlighting the critical need for effective patient education. This study compares the responses of ChatGPT-4, ChatGPT-3.5 and Gemini to PCOS-related questions using the latest guideline. Evaluating AI's integration into patient education necessitates assessing response quality, reliability, readability and effectiveness in managing PCOS.

PURPOSE

To evaluate the accuracy, quality, readability and tendency to hallucinate of ChatGPT-4, ChatGPT-3.5 and Gemini's responses to questions about PCOS, its assessment and management based on recommendations from the current international PCOS guideline.

METHODS

This cross-sectional study assessed ChatGPT-4, ChatGPT-3.5, and Gemini's responses to PCOS-related questions created by endocrinologists using the latest guidelines and common patient queries. Experts evaluated the responses for accuracy, quality and tendency to hallucinate using Likert scales, while readability was analyzed using standard formulas.

RESULTS

ChatGPT-4 and ChatGPT-3.5 attained higher scores in accuracy and quality compared to Gemini (p = 0.001, p < 0.001 and p = 0.007, p < 0.001 respectively). However, Gemini obtained a higher readability score compared to the other chatbots (p < 0.001). There was a significant difference between the tendency to hallucinate scores, which were due to the lower scores in Gemini (p = 0.003).

CONCLUSION

The high accuracy and quality of responses provided by ChatGPT-4 and 3.5 to questions about PCOS suggest that they could be supportive in clinical practice. Future technological advancements may facilitate the use of artificial intelligence in both educating patients with PCOS and supporting the management of the disorder.

摘要

背景

人工智能(AI)在医疗保健领域的应用日益广泛,ChatGPT和谷歌Gemini等模型在全球广受欢迎。多囊卵巢综合征(PCOS)是一种常见疾病,需要生活方式调整和药物治疗,这凸显了有效患者教育的迫切需求。本研究使用最新指南比较了ChatGPT-4、ChatGPT-3.5和Gemini对PCOS相关问题的回答。评估人工智能在患者教育中的整合需要评估回答质量、可靠性、可读性以及在管理PCOS方面的有效性。

目的

根据当前国际PCOS指南的建议,评估ChatGPT-4、ChatGPT-3.5和Gemini对PCOS相关问题及其评估和管理的回答的准确性、质量、可读性和产生幻觉的倾向。

方法

这项横断面研究评估了ChatGPT-4、ChatGPT-3.5和Gemini对内分泌学家根据最新指南和常见患者问题提出的PCOS相关问题的回答。专家使用李克特量表评估回答的准确性、质量和产生幻觉的倾向,同时使用标准公式分析可读性。

结果

与Gemini相比,ChatGPT-4和ChatGPT-3.5在准确性和质量方面得分更高(分别为p = 0.001,p < 0.001和p = 0.007,p < 0.001)。然而,与其他聊天机器人相比,Gemini的可读性得分更高(p < 0.001)。产生幻觉得分之间存在显著差异,这是由于Gemini的得分较低(p = 0.003)。

结论

ChatGPT-4和3.5对PCOS相关问题的回答具有较高的准确性和质量,表明它们在临床实践中可能具有辅助作用。未来的技术进步可能会促进人工智能在教育PCOS患者和支持该疾病管理方面的应用。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验