Pavone M, Palmieri L, Bizzarri N, Rosati A, Campolo F, Innocenzi C, Taliento C, Restaino S, Catena U, Vizzielli G, Akladios C, Ianieri M M, Marescaux J, Campo R, Fanfani F, Scambia G
Facts Views Vis Obgyn. 2024 Dec;16(4):449-456. doi: 10.52054/FVVO.16.4.052.
In 2022, OpenAI launched ChatGPT 3.5, which is now widely used in medical education, training, and research. Despite its valuable use for the generation of information, concerns persist about its authenticity and accuracy. Its undisclosed information source and outdated dataset pose risks of misinformation. Although it is widely used, AI-generated text inaccuracies raise doubts about its reliability. The ethical use of such technologies is crucial to uphold scientific accuracy in research.
This study aimed to assess the accuracy of ChatGPT in doing GESEA tests 1 and 2.
The 100 multiple-choice theoretical questions from GESEA certifications 1 and 2 were presented to ChatGPT, requesting the selection of the correct answer along with an explanation. Expert gynaecologists evaluated and graded the explanations for accuracy.
ChatGPT showed a 59% accuracy in responses, with 64% providing comprehensive explanations. It performed better in GESEA Level 1 (64% accuracy) than in GESEA Level 2 (54% accuracy) questions.
ChatGPT is a versatile tool in medicine and research, offering knowledge, information, and promoting evidence-based practice. Despite its widespread use, its accuracy has not been validated yet. This study found a 59% correct response rate, highlighting the need for accuracy validation and ethical use considerations. Future research should investigate ChatGPT's truthfulness in subspecialty fields such as gynaecologic oncology and compare different versions of chatbot for continuous improvement.
WHAT IS NEW?: Artificial intelligence (AI) has a great potential in scientific research. However, the validity of outputs remains unverified. This study aims to evaluate the accuracy of responses generated by ChatGPT to enhance the critical use of this tool.
2022年,OpenAI推出了ChatGPT 3.5,目前它在医学教育、培训和研究中得到了广泛应用。尽管它在信息生成方面有重要用途,但人们对其真实性和准确性仍存在担忧。其未公开的信息来源和过时的数据集带来了错误信息的风险。尽管它被广泛使用,但人工智能生成的文本不准确引发了对其可靠性的质疑。合理使用此类技术对于在研究中维护科学准确性至关重要。
本研究旨在评估ChatGPT在进行GESEA测试1和测试2时的准确性。
向ChatGPT呈现来自GESEA认证1和认证2的100道多项选择理论题,要求其选择正确答案并给出解释。妇科专家对这些解释的准确性进行评估和评分。
ChatGPT的回答准确率为59%,其中64%提供了全面的解释。它在GESEA 1级问题(准确率64%)上的表现优于GESEA 2级问题(准确率54%)。
ChatGPT是医学和研究中的一个多功能工具,可提供知识、信息并促进循证实践。尽管其被广泛使用,但其准确性尚未得到验证。本研究发现其正确回答率为59%,凸显了进行准确性验证和考虑合理使用的必要性。未来的研究应调查ChatGPT在妇科肿瘤学等亚专业领域的真实性,并比较不同版本的聊天机器人以持续改进。
人工智能在科学研究中有巨大潜力。然而,其输出的有效性仍未得到验证。本研究旨在评估ChatGPT生成回答的准确性,以加强对该工具的审慎使用。