• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人工智能、ChatGPT大语言模型:评估对妇科内镜手术教育与评估(GESEA)1-2级知识测试的回答准确性

Artificial Intelligence, the ChatGPT Large Language Model: Assessing the Accuracy of Responses to the Gynaecological Endoscopic Surgical Education and Assessment (GESEA) Level 1-2 knowledge tests.

作者信息

Pavone M, Palmieri L, Bizzarri N, Rosati A, Campolo F, Innocenzi C, Taliento C, Restaino S, Catena U, Vizzielli G, Akladios C, Ianieri M M, Marescaux J, Campo R, Fanfani F, Scambia G

出版信息

Facts Views Vis Obgyn. 2024 Dec;16(4):449-456. doi: 10.52054/FVVO.16.4.052.

DOI:10.52054/FVVO.16.4.052
PMID:39718328
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11819790/
Abstract

BACKGROUND

In 2022, OpenAI launched ChatGPT 3.5, which is now widely used in medical education, training, and research. Despite its valuable use for the generation of information, concerns persist about its authenticity and accuracy. Its undisclosed information source and outdated dataset pose risks of misinformation. Although it is widely used, AI-generated text inaccuracies raise doubts about its reliability. The ethical use of such technologies is crucial to uphold scientific accuracy in research.

OBJECTIVE

This study aimed to assess the accuracy of ChatGPT in doing GESEA tests 1 and 2.

MATERIALS AND METHODS

The 100 multiple-choice theoretical questions from GESEA certifications 1 and 2 were presented to ChatGPT, requesting the selection of the correct answer along with an explanation. Expert gynaecologists evaluated and graded the explanations for accuracy.

MAIN OUTCOME MEASURES

ChatGPT showed a 59% accuracy in responses, with 64% providing comprehensive explanations. It performed better in GESEA Level 1 (64% accuracy) than in GESEA Level 2 (54% accuracy) questions.

CONCLUSIONS

ChatGPT is a versatile tool in medicine and research, offering knowledge, information, and promoting evidence-based practice. Despite its widespread use, its accuracy has not been validated yet. This study found a 59% correct response rate, highlighting the need for accuracy validation and ethical use considerations. Future research should investigate ChatGPT's truthfulness in subspecialty fields such as gynaecologic oncology and compare different versions of chatbot for continuous improvement.

WHAT IS NEW?: Artificial intelligence (AI) has a great potential in scientific research. However, the validity of outputs remains unverified. This study aims to evaluate the accuracy of responses generated by ChatGPT to enhance the critical use of this tool.

摘要

背景

2022年,OpenAI推出了ChatGPT 3.5,目前它在医学教育、培训和研究中得到了广泛应用。尽管它在信息生成方面有重要用途,但人们对其真实性和准确性仍存在担忧。其未公开的信息来源和过时的数据集带来了错误信息的风险。尽管它被广泛使用,但人工智能生成的文本不准确引发了对其可靠性的质疑。合理使用此类技术对于在研究中维护科学准确性至关重要。

目的

本研究旨在评估ChatGPT在进行GESEA测试1和测试2时的准确性。

材料与方法

向ChatGPT呈现来自GESEA认证1和认证2的100道多项选择理论题,要求其选择正确答案并给出解释。妇科专家对这些解释的准确性进行评估和评分。

主要观察指标

ChatGPT的回答准确率为59%,其中64%提供了全面的解释。它在GESEA 1级问题(准确率64%)上的表现优于GESEA 2级问题(准确率54%)。

结论

ChatGPT是医学和研究中的一个多功能工具,可提供知识、信息并促进循证实践。尽管其被广泛使用,但其准确性尚未得到验证。本研究发现其正确回答率为59%,凸显了进行准确性验证和考虑合理使用的必要性。未来的研究应调查ChatGPT在妇科肿瘤学等亚专业领域的真实性,并比较不同版本的聊天机器人以持续改进。

新进展

人工智能在科学研究中有巨大潜力。然而,其输出的有效性仍未得到验证。本研究旨在评估ChatGPT生成回答的准确性,以加强对该工具的审慎使用。

相似文献

1
Artificial Intelligence, the ChatGPT Large Language Model: Assessing the Accuracy of Responses to the Gynaecological Endoscopic Surgical Education and Assessment (GESEA) Level 1-2 knowledge tests.人工智能、ChatGPT大语言模型:评估对妇科内镜手术教育与评估(GESEA)1-2级知识测试的回答准确性
Facts Views Vis Obgyn. 2024 Dec;16(4):449-456. doi: 10.52054/FVVO.16.4.052.
2
Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.ChatGPT 在临床医学研究生入学考试中的表现:调查研究。
JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.
3
Assessing question characteristic influences on ChatGPT's performance and response-explanation consistency: Insights from Taiwan's Nursing Licensing Exam.评估问题特征对 ChatGPT 表现和回应解释一致性的影响:来自台湾护理执照考试的见解。
Int J Nurs Stud. 2024 May;153:104717. doi: 10.1016/j.ijnurstu.2024.104717. Epub 2024 Feb 8.
4
Assessing the Quality and Reliability of ChatGPT's Responses to Radiotherapy-Related Patient Queries: Comparative Study With GPT-3.5 and GPT-4.评估ChatGPT对放疗相关患者问题回答的质量和可靠性:与GPT-3.5和GPT-4的比较研究
JMIR Cancer. 2025 Apr 16;11:e63677. doi: 10.2196/63677.
5
Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment.评估 ChatGPT 在前列腺癌患者教育中的疗效:多指标评估。
J Med Internet Res. 2024 Aug 14;26:e55939. doi: 10.2196/55939.
6
Evaluating ChatGPT to test its robustness as an interactive information database of radiation oncology and to assess its responses to common queries from radiotherapy patients: A single institution investigation.评估ChatGPT以测试其作为放射肿瘤学交互式信息数据库的稳健性,并评估其对放疗患者常见问题的回答:一项单机构调查。
Cancer Radiother. 2024 Jun;28(3):258-264. doi: 10.1016/j.canrad.2023.11.005. Epub 2024 Jun 12.
7
Assessing the Accuracy of Generative Conversational Artificial Intelligence in Debunking Sleep Health Myths: Mixed Methods Comparative Study With Expert Analysis.评估生成式对话人工智能在破除睡眠健康误区方面的准确性:采用专家分析的混合方法比较研究
JMIR Form Res. 2024 Apr 16;8:e55762. doi: 10.2196/55762.
8
Assessing ChatGPT vs. Standard Medical Resources for Endoscopic Sleeve Gastroplasty Education: A Medical Professional Evaluation Study.评估 ChatGPT 与标准医学资源在经内镜袖状胃切除术教育中的作用:一项医学专业人员评估研究。
Obes Surg. 2024 Jul;34(7):2718-2724. doi: 10.1007/s11695-024-07283-5. Epub 2024 May 17.
9
A Multidisciplinary Assessment of ChatGPT's Knowledge of Amyloidosis: Observational Study.对ChatGPT关于淀粉样变性知识的多学科评估:观察性研究。
JMIR Cardio. 2024 Apr 19;8:e53421. doi: 10.2196/53421.
10
ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.ChatGPT在德国妇产科考试中的表现——为人工智能强化医学教育和临床实践铺平道路。
Front Med (Lausanne). 2023 Dec 13;10:1296615. doi: 10.3389/fmed.2023.1296615. eCollection 2023.

本文引用的文献

1
SAGES surgical data science task force: enhancing surgical innovation, education and quality improvement through data science.SAGES手术数据科学特别工作组:通过数据科学提升手术创新、教育与质量改进。
Surg Endosc. 2024 Jul;38(7):3489-3493. doi: 10.1007/s00464-024-10921-9. Epub 2024 Jun 3.
2
Artificial Intelligence and ChatGPT in Abdominopelvic Surgery: A Systematic Review of Applications and Impact.人工智能和 ChatGPT 在腹盆腔手术中的应用及影响的系统评价
In Vivo. 2024 May-Jun;38(3):1009-1015. doi: 10.21873/invivo.13534.
3
Association of reviewer experience with discriminating human-written versus ChatGPT-written abstracts.评价者经验与区分人工撰写与 ChatGPT 撰写摘要的关联。
Int J Gynecol Cancer. 2024 May 6;34(5):669-674. doi: 10.1136/ijgc-2023-005162.
4
Ultrasound-guided robotic surgical procedures: a systematic review.超声引导机器人手术程序:系统评价。
Surg Endosc. 2024 May;38(5):2359-2370. doi: 10.1007/s00464-024-10772-4. Epub 2024 Mar 21.
5
Evaluating capabilities of large language models: Performance of GPT-4 on surgical knowledge assessments.评估大语言模型的能力:GPT-4在外科知识评估中的表现。
Surgery. 2024 Apr;175(4):936-942. doi: 10.1016/j.surg.2023.12.014. Epub 2024 Jan 20.
6
Initial experience of robotically assisted endometriosis surgery with a novel robotic system: first case series in a tertiary care center.机器人辅助子宫内膜异位症手术的初步经验:一家三级护理中心的首例病例系列。
Updates Surg. 2024 Jan;76(1):271-277. doi: 10.1007/s13304-023-01724-z. Epub 2023 Dec 22.
7
Delving into New Frontiers: assessing ChatGPT's proficiency in revealing uncharted dimensions of general surgery and pinpointing innovations for future advancements.探索新领域:评估ChatGPT在揭示普通外科未知领域以及确定未来进展创新方面的能力。
Langenbecks Arch Surg. 2023 Nov 24;408(1):446. doi: 10.1007/s00423-023-03173-z.
8
Let's chat about cervical cancer: Assessing the accuracy of ChatGPT responses to cervical cancer questions.让我们来聊聊宫颈癌:评估 ChatGPT 对宫颈癌问题回答的准确性。
Gynecol Oncol. 2023 Dec;179:164-168. doi: 10.1016/j.ygyno.2023.11.008. Epub 2023 Nov 21.
9
Beyond ChatGPT: What does GPT-4 add to healthcare? The dawn of a new era.超越 ChatGPT:GPT-4 为医疗保健带来了什么?新时代的曙光。
Cardiol J. 2023;30(6):1018-1025. doi: 10.5603/cj.97515. Epub 2023 Oct 13.
10
ChatGPT: promise and challenges for deployment in low- and middle-income countries.ChatGPT:在低收入和中等收入国家部署的前景与挑战。
Lancet Reg Health West Pac. 2023 Sep 15;41:100905. doi: 10.1016/j.lanwpc.2023.100905. eCollection 2023 Dec.