人工智能、ChatGPT大语言模型：评估对妇科内镜手术教育与评估（GESEA）1-2级知识测试的回答准确性

Artificial Intelligence, the ChatGPT Large Language Model: Assessing the Accuracy of Responses to the Gynaecological Endoscopic Surgical Education and Assessment (GESEA) Level 1-2 knowledge tests.

作者信息

Pavone M, Palmieri L, Bizzarri N, Rosati A, Campolo F, Innocenzi C, Taliento C, Restaino S, Catena U, Vizzielli G, Akladios C, Ianieri M M, Marescaux J, Campo R, Fanfani F, Scambia G

出版信息

Facts Views Vis Obgyn. 2024 Dec;16(4):449-456. doi: 10.52054/FVVO.16.4.052.

DOI:10.52054/FVVO.16.4.052

PMID:39718328

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11819790/

Abstract

BACKGROUND

In 2022, OpenAI launched ChatGPT 3.5, which is now widely used in medical education, training, and research. Despite its valuable use for the generation of information, concerns persist about its authenticity and accuracy. Its undisclosed information source and outdated dataset pose risks of misinformation. Although it is widely used, AI-generated text inaccuracies raise doubts about its reliability. The ethical use of such technologies is crucial to uphold scientific accuracy in research.

OBJECTIVE

This study aimed to assess the accuracy of ChatGPT in doing GESEA tests 1 and 2.

MATERIALS AND METHODS

The 100 multiple-choice theoretical questions from GESEA certifications 1 and 2 were presented to ChatGPT, requesting the selection of the correct answer along with an explanation. Expert gynaecologists evaluated and graded the explanations for accuracy.

MAIN OUTCOME MEASURES

ChatGPT showed a 59% accuracy in responses, with 64% providing comprehensive explanations. It performed better in GESEA Level 1 (64% accuracy) than in GESEA Level 2 (54% accuracy) questions.

CONCLUSIONS

ChatGPT is a versatile tool in medicine and research, offering knowledge, information, and promoting evidence-based practice. Despite its widespread use, its accuracy has not been validated yet. This study found a 59% correct response rate, highlighting the need for accuracy validation and ethical use considerations. Future research should investigate ChatGPT's truthfulness in subspecialty fields such as gynaecologic oncology and compare different versions of chatbot for continuous improvement.

WHAT IS NEW?: Artificial intelligence (AI) has a great potential in scientific research. However, the validity of outputs remains unverified. This study aims to evaluate the accuracy of responses generated by ChatGPT to enhance the critical use of this tool.

摘要

背景

2022年，OpenAI推出了ChatGPT 3.5，目前它在医学教育、培训和研究中得到了广泛应用。尽管它在信息生成方面有重要用途，但人们对其真实性和准确性仍存在担忧。其未公开的信息来源和过时的数据集带来了错误信息的风险。尽管它被广泛使用，但人工智能生成的文本不准确引发了对其可靠性的质疑。合理使用此类技术对于在研究中维护科学准确性至关重要。

目的

本研究旨在评估ChatGPT在进行GESEA测试1和测试2时的准确性。

材料与方法

向ChatGPT呈现来自GESEA认证1和认证2的100道多项选择理论题，要求其选择正确答案并给出解释。妇科专家对这些解释的准确性进行评估和评分。

主要观察指标

ChatGPT的回答准确率为59%，其中64%提供了全面的解释。它在GESEA 1级问题（准确率64%）上的表现优于GESEA 2级问题（准确率54%）。

结论

ChatGPT是医学和研究中的一个多功能工具，可提供知识、信息并促进循证实践。尽管其被广泛使用，但其准确性尚未得到验证。本研究发现其正确回答率为59%，凸显了进行准确性验证和考虑合理使用的必要性。未来的研究应调查ChatGPT在妇科肿瘤学等亚专业领域的真实性，并比较不同版本的聊天机器人以持续改进。

新进展

人工智能在科学研究中有巨大潜力。然而，其输出的有效性仍未得到验证。本研究旨在评估ChatGPT生成回答的准确性，以加强对该工具的审慎使用。

相似文献

Artificial Intelligence, the ChatGPT Large Language Model: Assessing the Accuracy of Responses to the Gynaecological Endoscopic Surgical Education and Assessment (GESEA) Level 1-2 knowledge tests.

Facts Views Vis Obgyn. 2024 Dec;16(4):449-456. doi: 10.52054/FVVO.16.4.052.

Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.

JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.

Assessing question characteristic influences on ChatGPT's performance and response-explanation consistency: Insights from Taiwan's Nursing Licensing Exam.

Int J Nurs Stud. 2024 May;153:104717. doi: 10.1016/j.ijnurstu.2024.104717. Epub 2024 Feb 8.

Assessing the Quality and Reliability of ChatGPT's Responses to Radiotherapy-Related Patient Queries: Comparative Study With GPT-3.5 and GPT-4.

JMIR Cancer. 2025 Apr 16;11:e63677. doi: 10.2196/63677.

Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment.

J Med Internet Res. 2024 Aug 14;26:e55939. doi: 10.2196/55939.

Evaluating ChatGPT to test its robustness as an interactive information database of radiation oncology and to assess its responses to common queries from radiotherapy patients: A single institution investigation.

Cancer Radiother. 2024 Jun;28(3):258-264. doi: 10.1016/j.canrad.2023.11.005. Epub 2024 Jun 12.

Assessing the Accuracy of Generative Conversational Artificial Intelligence in Debunking Sleep Health Myths: Mixed Methods Comparative Study With Expert Analysis.

JMIR Form Res. 2024 Apr 16;8:e55762. doi: 10.2196/55762.

Assessing ChatGPT vs. Standard Medical Resources for Endoscopic Sleeve Gastroplasty Education: A Medical Professional Evaluation Study.

Obes Surg. 2024 Jul;34(7):2718-2724. doi: 10.1007/s11695-024-07283-5. Epub 2024 May 17.

A Multidisciplinary Assessment of ChatGPT's Knowledge of Amyloidosis: Observational Study.

JMIR Cardio. 2024 Apr 19;8:e53421. doi: 10.2196/53421.

ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.

Front Med (Lausanne). 2023 Dec 13;10:1296615. doi: 10.3389/fmed.2023.1296615. eCollection 2023.

本文引用的文献

SAGES surgical data science task force: enhancing surgical innovation, education and quality improvement through data science.

Surg Endosc. 2024 Jul;38(7):3489-3493. doi: 10.1007/s00464-024-10921-9. Epub 2024 Jun 3.

Artificial Intelligence and ChatGPT in Abdominopelvic Surgery: A Systematic Review of Applications and Impact.

In Vivo. 2024 May-Jun;38(3):1009-1015. doi: 10.21873/invivo.13534.

Association of reviewer experience with discriminating human-written versus ChatGPT-written abstracts.

Int J Gynecol Cancer. 2024 May 6;34(5):669-674. doi: 10.1136/ijgc-2023-005162.

Ultrasound-guided robotic surgical procedures: a systematic review.

Surg Endosc. 2024 May;38(5):2359-2370. doi: 10.1007/s00464-024-10772-4. Epub 2024 Mar 21.

Evaluating capabilities of large language models: Performance of GPT-4 on surgical knowledge assessments.

Surgery. 2024 Apr;175(4):936-942. doi: 10.1016/j.surg.2023.12.014. Epub 2024 Jan 20.

Initial experience of robotically assisted endometriosis surgery with a novel robotic system: first case series in a tertiary care center.

Updates Surg. 2024 Jan;76(1):271-277. doi: 10.1007/s13304-023-01724-z. Epub 2023 Dec 22.

Delving into New Frontiers: assessing ChatGPT's proficiency in revealing uncharted dimensions of general surgery and pinpointing innovations for future advancements.

Langenbecks Arch Surg. 2023 Nov 24;408(1):446. doi: 10.1007/s00423-023-03173-z.

Let's chat about cervical cancer: Assessing the accuracy of ChatGPT responses to cervical cancer questions.

Gynecol Oncol. 2023 Dec;179:164-168. doi: 10.1016/j.ygyno.2023.11.008. Epub 2023 Nov 21.

Beyond ChatGPT: What does GPT-4 add to healthcare? The dawn of a new era.

Cardiol J. 2023;30(6):1018-1025. doi: 10.5603/cj.97515. Epub 2023 Oct 13.

ChatGPT: promise and challenges for deployment in low- and middle-income countries.

Lancet Reg Health West Pac. 2023 Sep 15;41:100905. doi: 10.1016/j.lanwpc.2023.100905. eCollection 2023 Dec.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

人工智能、ChatGPT大语言模型：评估对妇科内镜手术教育与评估（GESEA）1-2级知识测试的回答准确性

Artificial Intelligence, the ChatGPT Large Language Model: Assessing the Accuracy of Responses to the Gynaecological Endoscopic Surgical Education and Assessment (GESEA) Level 1-2 knowledge tests.

作者信息

Pavone M, Palmieri L, Bizzarri N, Rosati A, Campolo F, Innocenzi C, Taliento C, Restaino S, Catena U, Vizzielli G, Akladios C, Ianieri M M, Marescaux J, Campo R, Fanfani F, Scambia G

出版信息

Facts Views Vis Obgyn. 2024 Dec;16(4):449-456. doi: 10.52054/FVVO.16.4.052.

DOI:10.52054/FVVO.16.4.052

PMID:39718328

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11819790/

Abstract

BACKGROUND

OBJECTIVE

This study aimed to assess the accuracy of ChatGPT in doing GESEA tests 1 and 2.

MATERIALS AND METHODS

MAIN OUTCOME MEASURES

ChatGPT showed a 59% accuracy in responses, with 64% providing comprehensive explanations. It performed better in GESEA Level 1 (64% accuracy) than in GESEA Level 2 (54% accuracy) questions.

CONCLUSIONS

摘要

背景

目的

本研究旨在评估ChatGPT在进行GESEA测试1和测试2时的准确性。

材料与方法

向ChatGPT呈现来自GESEA认证1和认证2的100道多项选择理论题，要求其选择正确答案并给出解释。妇科专家对这些解释的准确性进行评估和评分。

主要观察指标

ChatGPT的回答准确率为59%，其中64%提供了全面的解释。它在GESEA 1级问题（准确率64%）上的表现优于GESEA 2级问题（准确率54%）。

结论

新进展

人工智能在科学研究中有巨大潜力。然而，其输出的有效性仍未得到验证。本研究旨在评估ChatGPT生成回答的准确性，以加强对该工具的审慎使用。

人工智能、ChatGPT大语言模型：评估对妇科内镜手术教育与评估（GESEA）1-2级知识测试的回答准确性

Artificial Intelligence, the ChatGPT Large Language Model: Assessing the Accuracy of Responses to the Gynaecological Endoscopic Surgical Education and Assessment (GESEA) Level 1-2 knowledge tests.

作者信息

出版信息

BACKGROUND

OBJECTIVE

MATERIALS AND METHODS

MAIN OUTCOME MEASURES

CONCLUSIONS

背景

目的

材料与方法

主要观察指标

结论

新进展

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

人工智能、ChatGPT大语言模型：评估对妇科内镜手术教育与评估（GESEA）1-2级知识测试的回答准确性

Artificial Intelligence, the ChatGPT Large Language Model: Assessing the Accuracy of Responses to the Gynaecological Endoscopic Surgical Education and Assessment (GESEA) Level 1-2 knowledge tests.

作者信息

出版信息

BACKGROUND

OBJECTIVE

MATERIALS AND METHODS

MAIN OUTCOME MEASURES

CONCLUSIONS

背景

目的

材料与方法

主要观察指标

结论

新进展