Suppr
超能文献

解答泌尿妇科常见问题：ChatGPT的准确性与局限性

Addressing Commonly Asked Questions in Urogynecology: Accuracy and Limitations of ChatGPT.

作者信息

Vurture Gregory, Jenkins Nicole, Ross James, Sansone Stephanie, Conner Ellen, Jacobson Nina, Smilen Scott, Baum Jonathan

机构信息

Division of Urogynecology, Department of Urology, Montefiore Medical Center-Albert Einstein College of Medicine, 1250 Waters Place, Tower Two, 9 th Floor, Bronx, NY, 10460, USA.

Department of Obstetrics and Gynecology, Hackensack Meridian Health-Jersey Shore University Medical Center, Neptune City, NJ, USA.

出版信息

Int Urogynecol J. 2025 Jun 18. doi: 10.1007/s00192-025-06184-0.

DOI:10.1007/s00192-025-06184-0

PMID:40531221

Abstract

INTRODUCTION AND HYPOTHESIS

Existing literature suggests that large language models such as Chat Generative Pre-training Transformer (ChatGPT) might provide inaccurate and unreliable health care information. The literature regarding its performance in urogynecology is scarce. The aim of the present study is to assess ChatGPT's ability to accurately answer commonly asked urogynecology patient questions.

METHODS

An expert panel of five board certified urogynecologists and two fellows developed ten commonly asked patient questions in a urogynecology office. Questions were phrased using diction and verbiage that a patient may use when asking a question over the internet. ChatGPT responses were evaluated using the Brief DISCERN (BD) tool, a validated scoring system for online health care information. Scores ≥ 16 are consistent with good-quality content. Responses were graded based on their accuracy and consistency with expert opinion and published guidelines.

RESULTS

The average score across all ten questions was 18.9 ± 2.7. Nine out of ten (90%) questions had a response that was determined to be of good quality (BD ≥ 16). The lowest scoring topic was "Pelvic Organ Prolapse" (mean BD = 14.0 ± 2.0). The highest scoring topic was "Interstitial Cystitis" (mean BD = 22.0 ± 0). ChatGPT provided no references for its responses.

CONCLUSIONS

ChatGPT provided high-quality responses to 90% of the questions based on an expert panel's review with the BD tool. Nonetheless, given the evolving nature of this technology, continued analysis is crucial before ChatGPT can be accepted as accurate and reliable.

摘要

引言与假设

现有文献表明，诸如聊天生成预训练变换器（ChatGPT）之类的大语言模型可能会提供不准确且不可靠的医疗保健信息。关于其在女性盆底疾病领域表现的文献稀缺。本研究的目的是评估ChatGPT准确回答女性盆底疾病患者常见问题的能力。

方法

由五名获得委员会认证的女性盆底疾病专家和两名研究员组成的专家小组，编写了十个女性盆底疾病门诊中患者常见的问题。问题的措辞采用了患者在网上提问时可能使用的措辞和用语。使用简短辨别工具（BD）对ChatGPT的回答进行评估，BD是一种经过验证的在线医疗保健信息评分系统。得分≥16表明内容质量良好。根据回答的准确性以及与专家意见和已发表指南的一致性对回答进行评分。

结果

所有十个问题的平均得分是18.9±2.7。十个问题中有九个（90%）的回答被判定为质量良好（BD≥16）。得分最低的主题是“盆腔器官脱垂”（平均BD=14.0±2.0）。得分最高的主题是“间质性膀胱炎”（平均BD=22.0±0）。ChatGPT的回答未提供参考文献。

结论

根据专家小组使用BD工具的评估，ChatGPT对90%的问题提供了高质量的回答。尽管如此，鉴于这项技术不断发展的性质，在ChatGPT被认定为准确可靠之前，持续分析至关重要。

相似文献

Addressing Commonly Asked Questions in Urogynecology: Accuracy and Limitations of ChatGPT.

Int Urogynecol J. 2025 Jun 18. doi: 10.1007/s00192-025-06184-0.

Can ChatGPT be trusted as a resource for a scholarly article on treatment planning implant-supported prostheses?

J Prosthet Dent. 2025 Apr 9. doi: 10.1016/j.prosdent.2025.03.025.

Evaluation of ChatGPT-4 as an Online Outpatient Assistant in Puerperal Mastitis Management: Content Analysis of an Observational Study.

JMIR Med Inform. 2025 Jul 24;13:e68980. doi: 10.2196/68980.

Sexual Harassment and Prevention Training

Is Information About Musculoskeletal Malignancies From Large Language Models or Web Resources at a Suitable Reading Level for Patients?

Clin Orthop Relat Res. 2025 Feb 1;483(2):306-315. doi: 10.1097/CORR.0000000000003263. Epub 2024 Sep 25.

Can generative artificial intelligence pass the orthopaedic board examination?

J Orthop. 2023 Nov 5;53:27-33. doi: 10.1016/j.jor.2023.10.026. eCollection 2024 Jul.

Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.

JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.

Using Artificial Intelligence ChatGPT to Access Medical Information about Chemical Eye Injuries: A Comparative Study.

JMIR Form Res. 2025 Jun 30. doi: 10.2196/73642.

Pharmacy meets AI: Effect of a drug information activity on student perceptions of generative artificial intelligence.

Curr Pharm Teach Learn. 2025 Jul 7;17(10):102439. doi: 10.1016/j.cptl.2025.102439.

Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.

Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.

本文引用的文献

ChatGPT-3.5 Versus Google Bard: Which Large Language Model Responds Best to Commonly Asked Pregnancy Questions?

Cureus. 2024 Jul 27;16(7):e65543. doi: 10.7759/cureus.65543. eCollection 2024 Jul.

Comparative Analysis of Performance of Large Language Models in Urogynecology.

Urogynecology (Phila). 2025 Jul 1;31(7):713-719. doi: 10.1097/SPV.0000000000001545. Epub 2024 Jun 27.

Human vs machine: identifying ChatGPT-generated abstracts in Gynecology and Urogynecology.

Am J Obstet Gynecol. 2024 Aug;231(2):276.e1-276.e10. doi: 10.1016/j.ajog.2024.04.045. Epub 2024 May 6.

ChatGPT's Response Consistency: A Study on Repeated Queries of Medical Examination Questions.

Eur J Investig Health Psychol Educ. 2024 Mar 8;14(3):657-668. doi: 10.3390/ejihpe14030043.

ChatGPT in Urogynecology Research: Novel or Not?

Urogynecology (Phila). 2024 Mar 25. doi: 10.1097/SPV.0000000000001505.

Ethical Concerns About ChatGPT in Healthcare: A Useful Tool or the Tombstone of Original and Reflective Thinking?

Cureus. 2024 Feb 23;16(2):e54759. doi: 10.7759/cureus.54759. eCollection 2024 Feb.

Evaluation of ChatGPT for Pelvic Floor Surgery Counseling.

Urogynecology (Phila). 2024 Mar 1;30(3):245-250. doi: 10.1097/SPV.0000000000001459.

Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments.

Sci Rep. 2023 Oct 1;13(1):16492. doi: 10.1038/s41598-023-43436-9.

Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology.

Cureus. 2023 Aug 4;15(8):e42972. doi: 10.7759/cureus.42972. eCollection 2023 Aug.

Quality of information and appropriateness of ChatGPT outputs for urology patients.

Prostate Cancer Prostatic Dis. 2024 Mar;27(1):103-108. doi: 10.1038/s41391-023-00705-y. Epub 2023 Jul 29.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

解答泌尿妇科常见问题：ChatGPT的准确性与局限性

Addressing Commonly Asked Questions in Urogynecology: Accuracy and Limitations of ChatGPT.

作者信息

机构信息

出版信息

INTRODUCTION AND HYPOTHESIS

METHODS

RESULTS

CONCLUSIONS

引言与假设

方法

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译