Suppr超能文献

解答泌尿妇科常见问题:ChatGPT的准确性与局限性

Addressing Commonly Asked Questions in Urogynecology: Accuracy and Limitations of ChatGPT.

作者信息

Vurture Gregory, Jenkins Nicole, Ross James, Sansone Stephanie, Conner Ellen, Jacobson Nina, Smilen Scott, Baum Jonathan

机构信息

Division of Urogynecology, Department of Urology, Montefiore Medical Center-Albert Einstein College of Medicine, 1250 Waters Place, Tower Two, 9 th Floor, Bronx, NY, 10460, USA.

Department of Obstetrics and Gynecology, Hackensack Meridian Health-Jersey Shore University Medical Center, Neptune City, NJ, USA.

出版信息

Int Urogynecol J. 2025 Jun 18. doi: 10.1007/s00192-025-06184-0.

Abstract

INTRODUCTION AND HYPOTHESIS

Existing literature suggests that large language models such as Chat Generative Pre-training Transformer (ChatGPT) might provide inaccurate and unreliable health care information. The literature regarding its performance in urogynecology is scarce. The aim of the present study is to assess ChatGPT's ability to accurately answer commonly asked urogynecology patient questions.

METHODS

An expert panel of five board certified urogynecologists and two fellows developed ten commonly asked patient questions in a urogynecology office. Questions were phrased using diction and verbiage that a patient may use when asking a question over the internet. ChatGPT responses were evaluated using the Brief DISCERN (BD) tool, a validated scoring system for online health care information. Scores ≥ 16 are consistent with good-quality content. Responses were graded based on their accuracy and consistency with expert opinion and published guidelines.

RESULTS

The average score across all ten questions was 18.9 ± 2.7. Nine out of ten (90%) questions had a response that was determined to be of good quality (BD ≥ 16). The lowest scoring topic was "Pelvic Organ Prolapse" (mean BD = 14.0 ± 2.0). The highest scoring topic was "Interstitial Cystitis" (mean BD = 22.0 ± 0). ChatGPT provided no references for its responses.

CONCLUSIONS

ChatGPT provided high-quality responses to 90% of the questions based on an expert panel's review with the BD tool. Nonetheless, given the evolving nature of this technology, continued analysis is crucial before ChatGPT can be accepted as accurate and reliable.

摘要

引言与假设

现有文献表明,诸如聊天生成预训练变换器(ChatGPT)之类的大语言模型可能会提供不准确且不可靠的医疗保健信息。关于其在女性盆底疾病领域表现的文献稀缺。本研究的目的是评估ChatGPT准确回答女性盆底疾病患者常见问题的能力。

方法

由五名获得委员会认证的女性盆底疾病专家和两名研究员组成的专家小组,编写了十个女性盆底疾病门诊中患者常见的问题。问题的措辞采用了患者在网上提问时可能使用的措辞和用语。使用简短辨别工具(BD)对ChatGPT的回答进行评估,BD是一种经过验证的在线医疗保健信息评分系统。得分≥16表明内容质量良好。根据回答的准确性以及与专家意见和已发表指南的一致性对回答进行评分。

结果

所有十个问题的平均得分是18.9±2.7。十个问题中有九个(90%)的回答被判定为质量良好(BD≥16)。得分最低的主题是“盆腔器官脱垂”(平均BD=14.0±2.0)。得分最高的主题是“间质性膀胱炎”(平均BD=22.0±0)。ChatGPT的回答未提供参考文献。

结论

根据专家小组使用BD工具的评估,ChatGPT对90%的问题提供了高质量的回答。尽管如此,鉴于这项技术不断发展的性质,在ChatGPT被认定为准确可靠之前,持续分析至关重要。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验