Suppr超能文献

建立一种新型评分系统并用于评估和比较ChatGPT-4妇产科咨询与医生咨询的质量:一项试点研究。

Establishing a novel score system and using it to assess and compare the quality of ChatGPT-4 consultation with physician consultation for obstetrics and gynecology: A pilot study.

作者信息

Lan Lan, Yang Ling, Li Jinyan, Hou Jia, Yan Yunsheng, Zhang Yaozong

机构信息

Department of Intensive Care Medicine, Women and Children's Hospital of Chongqing Medical University, Chongqing, China.

Emergency Department, Women and Children's Hospital of Chongqing Medical University, Chongqing, China.

出版信息

Int J Gynaecol Obstet. 2025 Mar;168(3):1251-1257. doi: 10.1002/ijgo.15934. Epub 2024 Sep 28.

Abstract

OBJECTIVES

In the current study, we aimed to establish a quantified scoring system for evaluating consultation quality. Subsequently, using the score system to assess the quality of ChatGPT-4 consultations, we compared them with physician consultations when presented with the same clinical cases from obstetrics and gynecology.

METHODS

This study was conducted in the Women and Children's Hospital of Chongqing Medical University, a tertiary-care hospital with approximately 16 000-20 000 deliveries and 8500-12 000 gynecologic surgeries per year. The detailed data from obstetric and gynecologic medical records were analyzed by ChatGPT-4 and physicians; the consultation opinions were then generated respectively. All consultation opinions were graded by eight junior doctors using the novel score system; subsequently, the correlation, agreement, and comparison between the two types of consultation opinions were then evaluated.

RESULTS

A total of 100 medical records from obstetrics and 100 medical records from gynecology were randomly selected. Pearson correlation analysis suggested a noncorrelation or weak correlation between consultations from ChatGPT-4 and physicians. Bland-Altman plot showed an unacceptable agreement between the two types of consultation opinions. Paired t tests showed that the scores of physician consultations were significantly higher than those generated by ChatGPT-4 in both obstetric and gynecologic patients.

CONCLUSION

At present, ChatGPT-4 may not be a substitute for physicians in consultations for obstetric and gynecologic patients. Therefore, it is crucial to pay careful attention and conduct ongoing evaluations to ensure the quality of consultation opinions generated by ChatGPT-4.

摘要

目的

在本研究中,我们旨在建立一个用于评估会诊质量的量化评分系统。随后,使用该评分系统评估ChatGPT-4会诊的质量,并将其与针对相同妇产科临床病例的医生会诊进行比较。

方法

本研究在重庆医科大学附属妇女儿童医院进行,这是一家三级医院,每年约有16000 - 20000例分娩和8500 - 12000例妇科手术。ChatGPT-4和医生分别分析妇产科病历的详细数据,然后分别给出会诊意见。所有会诊意见由8名初级医生使用新的评分系统进行评分;随后,评估两种会诊意见之间的相关性、一致性和比较情况。

结果

共随机选取了100份产科病历和100份妇科病历。Pearson相关分析表明ChatGPT-4的会诊与医生的会诊之间无相关性或弱相关性。Bland-Altman图显示两种会诊意见之间的一致性不可接受。配对t检验表明,在产科和妇科患者中,医生会诊的得分均显著高于ChatGPT-4生成的得分。

结论

目前,在妇产科患者的会诊中,ChatGPT-4可能无法替代医生。因此,密切关注并持续评估以确保ChatGPT-4生成的会诊意见质量至关重要。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验