文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.

作者信息

Bragazzi Nicola Luigi, Buchinger Michèle, Atwan Hisham, Tuma Ruba, Chirico Francesco, Szarpak Lukasz, Farah Raymond, Khamisy-Farah Rola

机构信息

Laboratory for Industrial and Applied Mathematics, Department of Mathematics and Statistics, York University, Toronto, ON, Canada.

Department of Computer Science, Data Science, and Information Technology, Faculty of Natural and Applied Sciences, Sol Plaatje University, Kimberley, South Africa.

出版信息

JMIR Form Res. 2025 Feb 5;9:e56126. doi: 10.2196/56126.


DOI:10.2196/56126
PMID:39794312
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11840386/
Abstract

BACKGROUND: The COVID-19 pandemic has significantly strained health care systems globally, leading to an overwhelming influx of patients and exacerbating resource limitations. Concurrently, an "infodemic" of misinformation, particularly prevalent in women's health, has emerged. This challenge has been pivotal for health care providers, especially gynecologists and obstetricians, in managing pregnant women's health. The pandemic heightened risks for pregnant women from COVID-19, necessitating balanced advice from specialists on vaccine safety versus known risks. In addition, the advent of generative artificial intelligence (AI), such as large language models (LLMs), offers promising support in health care. However, they necessitate rigorous testing. OBJECTIVE: This study aimed to assess LLMs' proficiency, clarity, and objectivity regarding COVID-19's impacts on pregnancy. METHODS: This study evaluates 4 major AI prototypes (ChatGPT-3.5, ChatGPT-4, Microsoft Copilot, and Google Bard) using zero-shot prompts in a questionnaire validated among 159 Israeli gynecologists and obstetricians. The questionnaire assesses proficiency in providing accurate information on COVID-19 in relation to pregnancy. Text-mining, sentiment analysis, and readability (Flesch-Kincaid grade level and Flesch Reading Ease Score) were also conducted. RESULTS: In terms of LLMs' knowledge, ChatGPT-4 and Microsoft Copilot each scored 97% (32/33), Google Bard 94% (31/33), and ChatGPT-3.5 82% (27/33). ChatGPT-4 incorrectly stated an increased risk of miscarriage due to COVID-19. Google Bard and Microsoft Copilot had minor inaccuracies concerning COVID-19 transmission and complications. In the sentiment analysis, Microsoft Copilot achieved the least negative score (-4), followed by ChatGPT-4 (-6) and Google Bard (-7), while ChatGPT-3.5 obtained the most negative score (-12). Finally, concerning the readability analysis, Flesch-Kincaid Grade Level and Flesch Reading Ease Score showed that Microsoft Copilot was the most accessible at 9.9 and 49, followed by ChatGPT-4 at 12.4 and 37.1, while ChatGPT-3.5 (12.9 and 35.6) and Google Bard (12.9 and 35.8) generated particularly complex responses. CONCLUSIONS: The study highlights varying knowledge levels of LLMs in relation to COVID-19 and pregnancy. ChatGPT-3.5 showed the least knowledge and alignment with scientific evidence. Readability and complexity analyses suggest that each AI's approach was tailored to specific audiences, with ChatGPT versions being more suitable for specialized readers and Microsoft Copilot for the general public. Sentiment analysis revealed notable variations in the way LLMs communicated critical information, underscoring the essential role of neutral and objective health care communication in ensuring that pregnant women, particularly vulnerable during the COVID-19 pandemic, receive accurate and reassuring guidance. Overall, ChatGPT-4, Microsoft Copilot, and Google Bard generally provided accurate, updated information on COVID-19 and vaccines in maternal and fetal health, aligning with health guidelines. The study demonstrated the potential role of AI in supplementing health care knowledge, with a need for continuous updating and verification of AI knowledge bases. The choice of AI tool should consider the target audience and required information detail level.

摘要
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aee6/11840386/5a333228a95e/formative_v9i1e56126_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aee6/11840386/5a333228a95e/formative_v9i1e56126_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aee6/11840386/5a333228a95e/formative_v9i1e56126_fig1.jpg

相似文献

[1]
Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.

JMIR Form Res. 2025-2-5

[2]
Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy.

Cureus. 2024-5-9

[3]
Appropriateness and readability of Google Bard and ChatGPT-3.5 generated responses for surgical treatment of glaucoma.

Rom J Ophthalmol. 2024

[4]
Microsoft Copilot Provides More Accurate and Reliable Information About Anterior Cruciate Ligament Injury and Repair Than ChatGPT and Google Gemini; However, No Resource Was Overall the Best.

Arthrosc Sports Med Rehabil. 2024-11-19

[5]
Decoding medical jargon: The use of AI language models (ChatGPT-4, BARD, microsoft copilot) in radiology reports.

Patient Educ Couns. 2024-9

[6]
Evaluation of Responses to Questions About Keratoconus Using ChatGPT-4.0, Google Gemini and Microsoft Copilot: A Comparative Study of Large Language Models on Keratoconus.

Eye Contact Lens. 2025-3-1

[7]
Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.

J Med Internet Res. 2023-12-28

[8]
Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.

Cureus. 2024-7-4

[9]
Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients.

Vascular. 2025-2

[10]
Investigating the role of large language models on questions about refractive surgery.

Int J Med Inform. 2025-3

本文引用的文献

[1]
Comparison of the Accuracy, Completeness, Reproducibility, and Consistency of Different AI Chatbots in Providing Nutritional Advice: An Exploratory Study.

J Clin Med. 2024-12-20

[2]
Evaluating the Performance of ChatGPT 3.5 and 4.0 on StatPearls Oculoplastic Surgery Text- and Image-Based Exam Questions.

Cureus. 2024-11-16

[3]
Comparing the performance of ChatGPT-3.5-Turbo, ChatGPT-4, and Google Bard with Iranian students in pre-internship comprehensive exams.

Sci Rep. 2024-11-18

[4]
Ensuring Accuracy and Equity in Vaccination Information From ChatGPT and CDC: Mixed-Methods Cross-Language Evaluation.

JMIR Form Res. 2024-10-30

[5]
Vaccination hesitancy: agreement between WHO and ChatGPT-4.0 or Gemini Advanced.

Ann Ig. 2025

[6]
Generative artificial intelligence, patient safety and healthcare quality: a review.

BMJ Qual Saf. 2024-10-18

[7]
Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study.

BMC Med Educ. 2024-6-26

[8]
Toward Clinical Generative AI: Conceptual Framework.

JMIR AI. 2024-6-7

[9]
The Era of ChatGPT and Large Language Models: Can We Advance Patient-centered Communications Appropriately and Safely?

Radiol Imaging Cancer. 2024-5

[10]
Accuracy of a large language model in distinguishing anti- and pro-vaccination messages on social media: The case of human papillomavirus vaccination.

Prev Med Rep. 2024-4-18

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索