Lee D, Brown M, Hammond J, Zakowski M
Department of Anesthesiology, 8700 Beverly Blvd #4209, Cedars-Sinai Medical Center, Los Angeles, CA 90064, United States.
Department of Anesthesiology, 8700 Beverly Blvd #4209, Cedars-Sinai Medical Center, Los Angeles, CA 90064, United States.
Int J Obstet Anesth. 2025 Feb;61:104317. doi: 10.1016/j.ijoa.2024.104317. Epub 2024 Dec 20.
Over 90% of pregnant women and 76% expectant fathers search for pregnancy health information. We examined readability, accuracy and quality of answers to common obstetric anesthesia questions from the popular generative artificial intelligence (AI) chatbots ChatGPT and Bard.
Twenty questions for generative AI chatbots were derived from frequently asked questions based on professional society, hospital and consumer websites. ChatGPT and Bard were queried in November 2023. Answers were graded for accuracy by four obstetric anesthesiologists. Quality was measured using Patient Education Materials Assessment Tool for Print (PEMAT). Readability was measured using six readability indices. Accuracy, quality and readability were compared using independent t-test.
Bard readability scores were high school level, significantly easier than ChatGPT's college level by all scoring metrics (P <0.001). Bard had significantly longer answers (P <0.001), yet with similar accuracy of Bard (85% ± 10) and ChatGPT (87% ± 14) (P=0.5). PEMAT understandability scores were no statistically significantly different (P=0.06). Actionability by PEMAT scores for Bard was significantly higher (22% vs. 9%) than ChatGPT (P=0.007) CONCLUSION: Answers to questions about "labor epidurals" should be accurate, high quality, and easy to read. Bard at high school reading level, was well above the goal 4 to 6 grade level suggested for patient materials. Consumers, health care providers, hospitals and governmental agencies should be aware of the quality of information generated by chatbots. Chatbots should meet the standards for readability and understandability of health-related questions, to aid public understanding and enhance shared decision-making.
超过90%的孕妇和76%的准父亲会搜索孕期健康信息。我们研究了热门生成式人工智能(AI)聊天机器人ChatGPT和Bard对常见产科麻醉问题的回答的可读性、准确性和质量。
基于专业协会、医院和消费者网站的常见问题,为生成式AI聊天机器人提出了20个问题。2023年11月对ChatGPT和Bard进行了查询。由四名产科麻醉医生对答案的准确性进行评分。使用印刷品患者教育材料评估工具(PEMAT)来衡量质量。使用六个可读性指标来衡量可读性。使用独立t检验比较准确性、质量和可读性。
Bard的可读性分数为高中水平,通过所有评分指标均明显比ChatGPT的大学水平更容易理解(P<0.001)。Bard的答案明显更长(P<0.001),但其准确性与ChatGPT相似(85%±10)和ChatGPT(87%±14)(P=0.5)。PEMAT可理解性分数在统计学上没有显著差异(P=0.06)。Bard的PEMAT分数的可操作性明显高于ChatGPT(22%对9%)(P=0.007)。结论:关于“分娩硬膜外麻醉”问题的答案应该准确、高质量且易于阅读。处于高中阅读水平的Bard,远高于建议的患者材料4至6年级的目标水平。消费者、医疗保健提供者、医院和政府机构应该意识到聊天机器人生成的信息质量。聊天机器人应该满足与健康相关问题的可读性和可理解性标准,以帮助公众理解并加强共同决策。