生成式人工智能聊天机器人针对分娩硬膜外麻醉常见问题的可读性、质量和准确性：ChatGPT与Bard的比较

Readability, quality and accuracy of generative artificial intelligence chatbots for commonly asked questions about labor epidurals: a comparison of ChatGPT and Bard.

作者信息

Lee D, Brown M, Hammond J, Zakowski M

机构信息

Department of Anesthesiology, 8700 Beverly Blvd #4209, Cedars-Sinai Medical Center, Los Angeles, CA 90064, United States.

出版信息

Int J Obstet Anesth. 2025 Feb;61:104317. doi: 10.1016/j.ijoa.2024.104317. Epub 2024 Dec 20.

DOI:10.1016/j.ijoa.2024.104317

PMID:39754839

Abstract

INTRODUCTION

Over 90% of pregnant women and 76% expectant fathers search for pregnancy health information. We examined readability, accuracy and quality of answers to common obstetric anesthesia questions from the popular generative artificial intelligence (AI) chatbots ChatGPT and Bard.

METHODS

Twenty questions for generative AI chatbots were derived from frequently asked questions based on professional society, hospital and consumer websites. ChatGPT and Bard were queried in November 2023. Answers were graded for accuracy by four obstetric anesthesiologists. Quality was measured using Patient Education Materials Assessment Tool for Print (PEMAT). Readability was measured using six readability indices. Accuracy, quality and readability were compared using independent t-test.

RESULTS

Bard readability scores were high school level, significantly easier than ChatGPT's college level by all scoring metrics (P <0.001). Bard had significantly longer answers (P <0.001), yet with similar accuracy of Bard (85% ± 10) and ChatGPT (87% ± 14) (P=0.5). PEMAT understandability scores were no statistically significantly different (P=0.06). Actionability by PEMAT scores for Bard was significantly higher (22% vs. 9%) than ChatGPT (P=0.007) CONCLUSION: Answers to questions about "labor epidurals" should be accurate, high quality, and easy to read. Bard at high school reading level, was well above the goal 4 to 6 grade level suggested for patient materials. Consumers, health care providers, hospitals and governmental agencies should be aware of the quality of information generated by chatbots. Chatbots should meet the standards for readability and understandability of health-related questions, to aid public understanding and enhance shared decision-making.

摘要

引言

超过90%的孕妇和76%的准父亲会搜索孕期健康信息。我们研究了热门生成式人工智能（AI）聊天机器人ChatGPT和Bard对常见产科麻醉问题的回答的可读性、准确性和质量。

方法

基于专业协会、医院和消费者网站的常见问题，为生成式AI聊天机器人提出了20个问题。2023年11月对ChatGPT和Bard进行了查询。由四名产科麻醉医生对答案的准确性进行评分。使用印刷品患者教育材料评估工具（PEMAT）来衡量质量。使用六个可读性指标来衡量可读性。使用独立t检验比较准确性、质量和可读性。

结果

Bard的可读性分数为高中水平，通过所有评分指标均明显比ChatGPT的大学水平更容易理解（P<0.001）。Bard的答案明显更长（P<0.001），但其准确性与ChatGPT相似（85%±10）和ChatGPT（87%±14）（P=0.5）。PEMAT可理解性分数在统计学上没有显著差异（P=0.06）。Bard的PEMAT分数的可操作性明显高于ChatGPT（22%对9%）（P=0.007）。结论：关于“分娩硬膜外麻醉”问题的答案应该准确、高质量且易于阅读。处于高中阅读水平的Bard，远高于建议的患者材料4至6年级的目标水平。消费者、医疗保健提供者、医院和政府机构应该意识到聊天机器人生成的信息质量。聊天机器人应该满足与健康相关问题的可读性和可理解性标准，以帮助公众理解并加强共同决策。

相似文献

Readability, quality and accuracy of generative artificial intelligence chatbots for commonly asked questions about labor epidurals: a comparison of ChatGPT and Bard.

Int J Obstet Anesth. 2025 Feb;61:104317. doi: 10.1016/j.ijoa.2024.104317. Epub 2024 Dec 20.

Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients.

Vascular. 2025 Feb;33(1):229-237. doi: 10.1177/17085381241240550. Epub 2024 Mar 18.

The promising role of chatbots in keratorefractive surgery patient education.

J Fr Ophtalmol. 2025 Feb;48(2):104381. doi: 10.1016/j.jfo.2024.104381. Epub 2024 Dec 13.

Appropriateness and readability of Google Bard and ChatGPT-3.5 generated responses for surgical treatment of glaucoma.

Rom J Ophthalmol. 2024 Jul-Sep;68(3):243-248. doi: 10.22336/rjo.2024.45.

Artificial intelligence chatbots versus traditional medical resources for patient education on "Labor Epidurals": an evaluation of accuracy, emotional tone, and readability.

Int J Obstet Anesth. 2025 Feb;61:104302. doi: 10.1016/j.ijoa.2024.104302. Epub 2024 Nov 26.

Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.

Medicine (Baltimore). 2025 Apr 11;104(15):e42135. doi: 10.1097/MD.0000000000042135.

Talking technology: exploring chatbots as a tool for cataract patient education.

Clin Exp Optom. 2025 Jan;108(1):56-64. doi: 10.1080/08164622.2023.2298812. Epub 2024 Jan 9.

Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.

Medicine (Baltimore). 2024 Aug 16;103(33):e39305. doi: 10.1097/MD.0000000000039305.

How artificial intelligence can provide information about subdural hematoma: Assessment of readability, reliability, and quality of ChatGPT, BARD, and perplexity responses.

Medicine (Baltimore). 2024 May 3;103(18):e38009. doi: 10.1097/MD.0000000000038009.

Artificial Doctors: Performance of Chatbots as a Tool for Patient Education on Keratoconus.

Eye Contact Lens. 2025 Mar 1;51(3):e112-e116. doi: 10.1097/ICL.0000000000001160. Epub 2024 Dec 31.

引用本文的文献

Comparison of the readability of ChatGPT and Bard in medical communication: a meta-analysis.

BMC Med Inform Decis Mak. 2025 Sep 1;25(1):325. doi: 10.1186/s12911-025-03035-2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

生成式人工智能聊天机器人针对分娩硬膜外麻醉常见问题的可读性、质量和准确性：ChatGPT与Bard的比较

Readability, quality and accuracy of generative artificial intelligence chatbots for commonly asked questions about labor epidurals: a comparison of ChatGPT and Bard.

作者信息

Lee D, Brown M, Hammond J, Zakowski M

机构信息

Department of Anesthesiology, 8700 Beverly Blvd #4209, Cedars-Sinai Medical Center, Los Angeles, CA 90064, United States.

出版信息

Int J Obstet Anesth. 2025 Feb;61:104317. doi: 10.1016/j.ijoa.2024.104317. Epub 2024 Dec 20.

DOI:10.1016/j.ijoa.2024.104317

PMID:39754839

Abstract

INTRODUCTION

METHODS

RESULTS

摘要

生成式人工智能聊天机器人针对分娩硬膜外麻醉常见问题的可读性、质量和准确性：ChatGPT与Bard的比较

Readability, quality and accuracy of generative artificial intelligence chatbots for commonly asked questions about labor epidurals: a comparison of ChatGPT and Bard.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

引言

方法

结果

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

生成式人工智能聊天机器人针对分娩硬膜外麻醉常见问题的可读性、质量和准确性：ChatGPT与Bard的比较

Readability, quality and accuracy of generative artificial intelligence chatbots for commonly asked questions about labor epidurals: a comparison of ChatGPT and Bard.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

引言

方法

结果

相似文献

引用本文的文献