Gunesli Irmak, Aksun Seren, Fathelbab Jana, Yildiz Bulent Okan
Hacettepe University School of Medicine, Department of Internal Medicine, Ankara, Turkey.
Hacettepe University School of Medicine, Division of Endocrinology and Metabolism, Ankara, Turkey.
Endocrine. 2025 Apr;88(1):315-322. doi: 10.1007/s12020-024-04121-7. Epub 2024 Dec 2.
Artificial intelligence (AI) is increasingly utilized in healthcare, with models like ChatGPT and Google Gemini gaining global popularity. Polycystic ovary syndrome (PCOS) is a prevalent condition that requires both lifestyle modifications and medical treatment, highlighting the critical need for effective patient education. This study compares the responses of ChatGPT-4, ChatGPT-3.5 and Gemini to PCOS-related questions using the latest guideline. Evaluating AI's integration into patient education necessitates assessing response quality, reliability, readability and effectiveness in managing PCOS.
To evaluate the accuracy, quality, readability and tendency to hallucinate of ChatGPT-4, ChatGPT-3.5 and Gemini's responses to questions about PCOS, its assessment and management based on recommendations from the current international PCOS guideline.
This cross-sectional study assessed ChatGPT-4, ChatGPT-3.5, and Gemini's responses to PCOS-related questions created by endocrinologists using the latest guidelines and common patient queries. Experts evaluated the responses for accuracy, quality and tendency to hallucinate using Likert scales, while readability was analyzed using standard formulas.
ChatGPT-4 and ChatGPT-3.5 attained higher scores in accuracy and quality compared to Gemini (p = 0.001, p < 0.001 and p = 0.007, p < 0.001 respectively). However, Gemini obtained a higher readability score compared to the other chatbots (p < 0.001). There was a significant difference between the tendency to hallucinate scores, which were due to the lower scores in Gemini (p = 0.003).
The high accuracy and quality of responses provided by ChatGPT-4 and 3.5 to questions about PCOS suggest that they could be supportive in clinical practice. Future technological advancements may facilitate the use of artificial intelligence in both educating patients with PCOS and supporting the management of the disorder.
人工智能(AI)在医疗保健领域的应用日益广泛,ChatGPT和谷歌Gemini等模型在全球广受欢迎。多囊卵巢综合征(PCOS)是一种常见疾病,需要生活方式调整和药物治疗,这凸显了有效患者教育的迫切需求。本研究使用最新指南比较了ChatGPT-4、ChatGPT-3.5和Gemini对PCOS相关问题的回答。评估人工智能在患者教育中的整合需要评估回答质量、可靠性、可读性以及在管理PCOS方面的有效性。
根据当前国际PCOS指南的建议,评估ChatGPT-4、ChatGPT-3.5和Gemini对PCOS相关问题及其评估和管理的回答的准确性、质量、可读性和产生幻觉的倾向。
这项横断面研究评估了ChatGPT-4、ChatGPT-3.5和Gemini对内分泌学家根据最新指南和常见患者问题提出的PCOS相关问题的回答。专家使用李克特量表评估回答的准确性、质量和产生幻觉的倾向,同时使用标准公式分析可读性。
与Gemini相比,ChatGPT-4和ChatGPT-3.5在准确性和质量方面得分更高(分别为p = 0.001,p < 0.001和p = 0.007,p < 0.001)。然而,与其他聊天机器人相比,Gemini的可读性得分更高(p < 0.001)。产生幻觉得分之间存在显著差异,这是由于Gemini的得分较低(p = 0.003)。
ChatGPT-4和3.5对PCOS相关问题的回答具有较高的准确性和质量,表明它们在临床实践中可能具有辅助作用。未来的技术进步可能会促进人工智能在教育PCOS患者和支持该疾病管理方面的应用。