Department of Obstetrics and Gynecology, Shaare Zedek Medical Center, Affiliated with the Hebrew University School of Medicine, Jerusalem, Israel.
Division of Maternal-Fetal Medicine, Department of Obstetrics and Gynecology, Hamilton Health Sciences, McMaster University, Hamilton, Ontario, Canada.
Int J Gynaecol Obstet. 2024 Sep;166(3):1127-1133. doi: 10.1002/ijgo.15501. Epub 2024 Mar 25.
To evaluate the quality of ChatGPT responses to common issues in obstetrics and assess its ability to provide reliable responses to pregnant individuals. The study aimed to examine the responses based on expert opinions using predetermined criteria, including "accuracy," "completeness," and "safety."
We curated 15 common and potentially clinically significant questions that pregnant women are asking. Two native English-speaking women were asked to reframe the questions in their own words, and we employed the ChatGPT language model to generate responses to the questions. To evaluate the accuracy, completeness, and safety of the ChatGPT's generated responses, we developed a questionnaire with a scale of 1 to 5 that obstetrics and gynecology experts from different countries were invited to rate accordingly. The ratings were analyzed to evaluate the average level of agreement and percentage of positive ratings (≥4) for each criterion.
Of the 42 experts invited, 20 responded to the questionnaire. The combined score for all responses yielded a mean rating of 4, with 75% of responses receiving a positive rating (≥4). While examining specific criteria, the ChatGPT responses were better for the accuracy criterion, with a mean rating of 4.2 and 80% of the questions received a positive rating. The responses scored less for the completeness criterion, with a mean rating of 3.8 and 46.7% of questions received a positive rating. For safety, the mean rating was 3.9 and 53.3% of questions received a positive rating. There was no response with an average negative rating below three.
This study demonstrates promising results regarding potential use of ChatGPT's in providing accurate responses to obstetric clinical questions posed by pregnant women. However, it is crucial to exercise caution when addressing inquiries concerning the safety of the fetus or the mother.
评估 ChatGPT 对妇产科常见问题的回答质量,并评估其为孕妇提供可靠回答的能力。本研究旨在使用预定标准(包括“准确性”、“完整性”和“安全性”),基于专家意见来检查回答。
我们整理了 15 个常见且具有潜在临床意义的问题,这些问题是孕妇正在询问的。两名以英语为母语的女性被要求用自己的话重新表述这些问题,我们使用 ChatGPT 语言模型来回答这些问题。为了评估 ChatGPT 生成的回答的准确性、完整性和安全性,我们开发了一个 1 到 5 分的问卷,邀请了来自不同国家的妇产科专家进行评分。分析评分以评估每个标准的平均一致性水平和阳性评分(≥4)的百分比。
在邀请的 42 名专家中,有 20 名回应了问卷。所有回应的综合得分为 4 分,有 75%的回应获得了阳性评分(≥4)。在检查具体标准时,ChatGPT 的回答在准确性标准方面表现更好,平均评分为 4.2,80%的问题获得了阳性评分。在完整性标准方面,得分较低,平均评分为 3.8,46.7%的问题获得了阳性评分。在安全性方面,平均评分为 3.9,53.3%的问题获得了阳性评分。没有任何一个回答的平均负面评分低于 3 分。
本研究表明,ChatGPT 有潜力用于为孕妇提出的妇产科临床问题提供准确的回答。然而,在处理涉及胎儿或母亲安全的询问时,谨慎行事至关重要。