Devranoglu Belgin, Gurbuz Tugba, Gokmen Oya
Department of Obstetrics and Gynecology, Zeynep Kamil Maternity/Children, Education and Training Hospital, Istanbul 34480, Turkey.
Department of Gynecology and Obstetrics Clinic, Medistate Hospital, Istanbul 34820, Turkey.
Diagnostics (Basel). 2024 May 22;14(11):1082. doi: 10.3390/diagnostics14111082.
This study assesses the efficacy of ChatGPT-4, an advanced artificial intelligence (AI) language model, in delivering precise and comprehensive answers to inquiries regarding managing polycystic ovary syndrome (PCOS)-related infertility. The research team, comprising experienced gynecologists, formulated 460 structured queries encompassing a wide range of common and intricate PCOS scenarios. The queries were: true/false (170), open-ended (165), and multiple-choice (125) and further classified as 'easy', 'moderate', and 'hard'. For true/false questions, ChatGPT-4 achieved a flawless accuracy rate of 100% initially and upon reassessment after 30 days. In the open-ended category, there was a noteworthy enhancement in accuracy, with scores increasing from 5.53 ± 0.89 initially to 5.88 ± 0.43 at the 30-day mark ( < 0.001). Completeness scores for open-ended queries also experienced a significant improvement, rising from 2.35 ± 0.58 to 2.92 ± 0.29 ( < 0.001). In the multiple-choice category, although the accuracy score exhibited a minor decline from 5.96 ± 0.44 to 5.92 ± 0.63 after 30 days ( > 0.05). Completeness scores for multiple-choice questions remained consistent, with initial and 30-day means of 2.98 ± 0.18 and 2.97 ± 0.25, respectively ( > 0.05). ChatGPT-4 demonstrated exceptional performance in true/false queries and significantly improved handling of open-ended questions during the 30 days. These findings emphasize the potential of AI, particularly ChatGPT-4, in enhancing decision-making support for healthcare professionals managing PCOS-related infertility.
本研究评估了先进的人工智能(AI)语言模型ChatGPT-4在提供有关多囊卵巢综合征(PCOS)相关不孕症管理问题的精确和全面答案方面的效果。由经验丰富的妇科医生组成的研究团队制定了460个结构化问题,涵盖了广泛的常见和复杂的PCOS情况。这些问题包括:是非题(170个)、开放式问题(165个)和多项选择题(125个),并进一步分为“简单”、“中等”和“困难”。对于是非题,ChatGPT-4最初以及在30天后重新评估时的准确率均达到了完美的100%。在开放式问题类别中,准确率有显著提高,分数从最初的5.53±0.89提高到30天时的5.88±0.43(<0.001)。开放式问题的完整性分数也有显著提高,从2.35±0.58提高到2.92±0.29(<0.001)。在多项选择题类别中,虽然30天后准确率分数略有下降,从5.96±0.44降至5.92±0.63(>0.05)。多项选择题的完整性分数保持一致,最初和30天的平均值分别为2.98±0.18和2.97±0.25(>0.05)。ChatGPT-4在是非题查询中表现出色,并且在30天内显著改善了对开放式问题的处理。这些发现强调了人工智能,特别是ChatGPT-4在增强对管理PCOS相关不孕症的医疗保健专业人员的决策支持方面的潜力。