• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
ChatGPT-3.5 Versus Google Bard: Which Large Language Model Responds Best to Commonly Asked Pregnancy Questions?ChatGPT-3.5与谷歌巴德:哪种大语言模型对常见的怀孕问题回答得最好?
Cureus. 2024 Jul 27;16(7):e65543. doi: 10.7759/cureus.65543. eCollection 2024 Jul.
2
Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.评估生成式 AI 大语言模型 ChatGPT、Google Bard 和 Microsoft Bing Chat 在支持循证牙科方面的性能:比较混合方法研究。
J Med Internet Res. 2023 Dec 28;25:e51580. doi: 10.2196/51580.
3
Assessing the Accuracy of Information on Medication Abortion: A Comparative Analysis of ChatGPT and Google Bard AI.评估药物流产信息的准确性:ChatGPT与谷歌巴德人工智能的比较分析
Cureus. 2024 Jan 2;16(1):e51544. doi: 10.7759/cureus.51544. eCollection 2024 Jan.
4
Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing.生成式人工智能大语言模型在正畸学中的循证潜力:ChatGPT、谷歌巴德和微软必应的比较研究
Eur J Orthod. 2024 Apr 13. doi: 10.1093/ejo/cjae017.
5
Chat Generative Pretrained Transformer (ChatGPT) and Bard: Artificial Intelligence Does not yet Provide Clinically Supported Answers for Hip and Knee Osteoarthritis.聊天生成预训练转换器(ChatGPT)和巴德:人工智能尚未为髋和膝关节骨关节炎提供临床支持的答案。
J Arthroplasty. 2024 May;39(5):1184-1190. doi: 10.1016/j.arth.2024.01.029. Epub 2024 Jan 17.
6
The performance of artificial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard.人工智能模型在生成正畸常见问题回答方面的表现:ChatGPT与谷歌巴德的对比
Am J Orthod Dentofacial Orthop. 2024 Jun;165(6):652-662. doi: 10.1016/j.ajodo.2024.01.012. Epub 2024 Mar 15.
7
How AI Responds to Common Lung Cancer Questions: ChatGPT vs Google Bard.人工智能如何回答常见肺癌问题:ChatGPT 与 Google Bard 对比。
Radiology. 2023 Jun;307(5):e230922. doi: 10.1148/radiol.230922.
8
Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard.比较分析 ChatGPT-3.5、ChatGPT-4.0 和谷歌巴德在近视防控方面的表现:大型语言模型的基准测试。
EBioMedicine. 2023 Sep;95:104770. doi: 10.1016/j.ebiom.2023.104770. Epub 2023 Aug 23.
9
Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology.大语言模型(ChatGPT、必应搜索和谷歌巴德)在解决生理学病例 vignettes 中的表现。
Cureus. 2023 Aug 4;15(8):e42972. doi: 10.7759/cureus.42972. eCollection 2023 Aug.
10
Utility of Large Language Models for Health Care Professionals and Patients in Navigating Hematopoietic Stem Cell Transplantation: Comparison of the Performance of ChatGPT-3.5, ChatGPT-4, and Bard.大型语言模型在造血干细胞移植导航中对医疗保健专业人员和患者的实用性:ChatGPT-3.5、ChatGPT-4 和 Bard 的性能比较。
J Med Internet Res. 2024 May 17;26:e54758. doi: 10.2196/54758.

引用本文的文献

1
Can ChatGPT Provide Patient-Friendly and Reliable Information on Cervical Cancer Screening? A Study of ChatGPT-Generated Information in Polish.ChatGPT能否提供有关宫颈癌筛查的患者友好且可靠的信息?一项关于波兰语的ChatGPT生成信息的研究。
Med Sci Monit. 2025 Jul 3;31:e947992. doi: 10.12659/MSM.947992.
2
Addressing Commonly Asked Questions in Urogynecology: Accuracy and Limitations of ChatGPT.解答泌尿妇科常见问题:ChatGPT的准确性与局限性
Int Urogynecol J. 2025 Jun 18. doi: 10.1007/s00192-025-06184-0.
3
A Comparative Analysis of Artificial Intelligence Platforms: ChatGPT-4o and Google Gemini in Answering Questions About Birth Control Methods.人工智能平台的比较分析:ChatGPT-4o与谷歌Gemini在回答避孕方法相关问题方面的表现
Cureus. 2025 Jan 1;17(1):e76745. doi: 10.7759/cureus.76745. eCollection 2025 Jan.

本文引用的文献

1
The Scientific Knowledge of Bard and ChatGPT in Endocrinology, Diabetes, and Diabetes Technology: Multiple-Choice Questions Examination-Based Performance.巴德和ChatGPT在内分泌学、糖尿病及糖尿病技术方面的科学知识:基于多项选择题考试的表现
J Diabetes Sci Technol. 2025 May;19(3):705-710. doi: 10.1177/19322968231203987. Epub 2023 Oct 5.
2
Can artificial intelligence replace biochemists? A study comparing interpretation of thyroid function test results by ChatGPT and Google Bard to practising biochemists.人工智能能否取代生物化学家?一项将 ChatGPT 和 Google Bard 对甲状腺功能测试结果的解释与执业生物化学家进行比较的研究。
Ann Clin Biochem. 2024 Mar;61(2):143-149. doi: 10.1177/00045632231203473. Epub 2023 Sep 20.
3
Comparative Evaluation of Diagnostic Accuracy Between Google Bard and Physicians.谷歌巴德与医生之间诊断准确性的比较评估
Am J Med. 2023 Nov;136(11):1119-1123.e18. doi: 10.1016/j.amjmed.2023.08.003. Epub 2023 Aug 27.
4
New Artificial Intelligence ChatGPT Performs Poorly on the 2022 Self-assessment Study Program for Urology.新的人工智能 ChatGPT 在 2022 年泌尿科自我评估研究项目中表现不佳。
Urol Pract. 2023 Jul;10(4):409-415. doi: 10.1097/UPJ.0000000000000406. Epub 2023 Jun 5.
5
Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma.评估 ChatGPT 在回答肝硬化和肝细胞癌相关问题方面的表现。
Clin Mol Hepatol. 2023 Jul;29(3):721-732. doi: 10.3350/cmh.2023.0089. Epub 2023 Mar 22.
6
The exciting potential for ChatGPT in obstetrics and gynecology.ChatGPT 在妇产科领域的令人兴奋的潜力。
Am J Obstet Gynecol. 2023 Jun;228(6):696-705. doi: 10.1016/j.ajog.2023.03.009. Epub 2023 Mar 15.
7
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.
8
Use of the Internet by pregnant women to seek information about pregnancy and childbirth.孕妇使用互联网获取有关怀孕和分娩的信息。
Inform Health Soc Care. 2020 Oct 1;45(4):385-395. doi: 10.1080/17538157.2020.1769106. Epub 2020 Jun 2.
9
Tobacco and Nicotine Cessation During Pregnancy: ACOG Committee Opinion, Number 807.妊娠期戒烟:美国妇产科医师学会委员会意见,第 807 号。
Obstet Gynecol. 2020 May;135(5):e221-e229. doi: 10.1097/AOG.0000000000003822.
10
Prospective Evaluation of Maternal Sleep Position Through 30 Weeks of Gestation and Adverse Pregnancy Outcomes.前瞻性评估妊娠 30 周时的母体睡眠姿势与不良妊娠结局。
Obstet Gynecol. 2019 Oct;134(4):667-676. doi: 10.1097/AOG.0000000000003458.

ChatGPT-3.5与谷歌巴德:哪种大语言模型对常见的怀孕问题回答得最好?

ChatGPT-3.5 Versus Google Bard: Which Large Language Model Responds Best to Commonly Asked Pregnancy Questions?

作者信息

Khromchenko Keren, Shaikh Sameeha, Singh Meghana, Vurture Gregory, Rana Rima A, Baum Jonathan D

机构信息

Obstetrics and Gynecology, Hackensack Meridian Jersey Shore University Medical Center, Neptune, USA.

Obstetrics and Gynecology, Hackensack Meridian School of Medicine, Nutley, USA.

出版信息

Cureus. 2024 Jul 27;16(7):e65543. doi: 10.7759/cureus.65543. eCollection 2024 Jul.

DOI:10.7759/cureus.65543
PMID:39188430
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11346960/
Abstract

Large language models (LLM) have been widely used to provide information in many fields, including obstetrics and gynecology. Which model performs best in providing answers to commonly asked pregnancy questions is unknown. A qualitative analysis of Chat Generative Pre-Training Transformer Version 3.5 (ChatGPT-3.5) (OpenAI, Inc., San Francisco, California, United States) and Bard, recently renamed Google Gemini (Google LLC, Mountain View, California, United States), was performed in August of 2023. Each LLM was queried on 12 commonly asked pregnancy questions and asked for their references. Review and grading of the responses and references for both LLMs were performed by the co-authors individually and then as a group to formulate a consensus. Query responses were graded as "acceptable" or "not acceptable" based on correctness and completeness in comparison to American College of Obstetricians and Gynecologists (ACOG) publications, PubMed-indexed evidence, and clinical experience. References were classified as "verified," "broken," "irrelevant," "non-existent," and "no references." Grades of "acceptable" were given to 58% of ChatGPT-3.5 responses (seven out of 12) and 83% of Bard responses (10 out of 12). In regard to references, ChatGPT-3.5 had reference issues in 100% of its references, and Bard had discrepancies in 8% of its references (one out of 12). When comparing ChatGPT-3.5 responses between May 2023 and August 2023, a change in "acceptable" responses was noted: 50% versus 58%, respectively. Bard answered more questions correctly than ChatGPT-3.5 when queried on a small sample of commonly asked pregnancy questions. ChatGPT-3.5 performed poorly in terms of reference verification. The overall performance of ChatGPT-3.5 remained stable over time, with approximately one-half of responses being "acceptable" in both May and August of 2023. Both LLMs need further evaluation and vetting before being accepted as accurate and reliable sources of information for pregnant women.

摘要

大语言模型(LLM)已被广泛应用于包括妇产科在内的许多领域来提供信息。目前尚不清楚哪种模型在回答常见的怀孕问题方面表现最佳。2023年8月对Chat Generative Pre-Training Transformer Version 3.5(ChatGPT-3.5)(美国加利福尼亚州旧金山的OpenAI公司)和最近更名为谷歌Gemini(美国加利福尼亚州山景城的谷歌有限责任公司)的Bard进行了定性分析。每个大语言模型都被询问了12个常见的怀孕问题,并要求提供参考文献。两位共同作者先分别然后作为一个小组对两个大语言模型的回答和参考文献进行审查和评分,以达成共识。根据与美国妇产科医师学会(ACOG)出版物、PubMed索引证据和临床经验相比的正确性和完整性,查询回复被评为“可接受”或“不可接受”。参考文献被分类为“已验证”、“损坏”、“不相关”、“不存在”和“无参考文献”。ChatGPT-3.5的回复中有58%(12个中的7个)被评为“可接受”,Bard的回复中有83%(12个中的10个)被评为“可接受”。在参考文献方面,ChatGPT-3.5的所有参考文献都存在问题,而Bard的参考文献中有8%(12个中的1个)存在差异。比较2023年5月至8月ChatGPT-3.5的回复时,注意到“可接受”回复的变化:分别为50%和58%。在对一小部分常见怀孕问题进行查询时,Bard回答正确的问题比ChatGPT-3.5多。ChatGPT-3.5在参考文献验证方面表现不佳。ChatGPT-3.5的总体性能随时间保持稳定,在2023年5月和8月,大约一半的回复是“可接受”的。在被接受为孕妇准确可靠的信息来源之前,这两个大语言模型都需要进一步评估和审查。