• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ChatGPT与Gemini在回答病毒性肝炎相关问题时的性能比较。

Comparison of the performances between ChatGPT and Gemini in answering questions on viral hepatitis.

作者信息

Sahin Ozdemir Meryem, Ozdemir Yusuf Emre

机构信息

Department of Infectious Diseases and Clinical Microbiology, Basaksehir Cam and Sakura City Hospital, Istanbul, 34480, Turkey.

Department of Infectious Diseases and Clinical Microbiology, Bakirkoy Dr Sadi Konuk Training and Research Hospital, Istanbul, 34140, Turkey.

出版信息

Sci Rep. 2025 Jan 11;15(1):1712. doi: 10.1038/s41598-024-83575-1.

DOI:10.1038/s41598-024-83575-1
PMID:39799203
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11724965/
Abstract

This is the first study to evaluate the adequacy and reliability of the ChatGPT and Gemini chatbots on viral hepatitis. A total of 176 questions were composed from three different categories. The first group includes "questions and answers (Q&As) for the public" determined by the Centers for Disease Control and Prevention (CDC). The second group includes strong recommendations of international guidelines. The third group includes frequently asked questions on social media platforms. The answers of the chatbots were evaluated by two different infectious diseases specialists on a scoring scale from 1 to 4. Cohen's kappa coefficient was calculated to assess inter-rater reliability. The reproducibility and correlation of answers generated by ChatGPT and Gemini were analyzed. ChatGPT and Gemini's mean scores (3.55 ± 0.83 vs. 3.57 ± 0.89, p = 0.260) and completely correct response rates (71.0% vs. 78.4%, p = 0.111) were similar. In addition, in subgroup analyses with the CDC questions Sect. (90.1% vs. 91.9%, p = 0.752), the guideline questions Sect. (49.4% vs. 61.4%, p = 0.140), and the social media platform questions Sect. (82.5% vs. 90%, p = 0.335), the completely correct answers rates were similar. There was a moderate positive correlation between ChatGPT and Gemini chatbots' answers (r = 0.633, p < 0.001). Reproducibility rates of answers to questions were 91.3% in ChatGPT and 92% in Gemini (p = 0.710). According to Cohen's kappa test, there was a substantial inter-rater agreement for both ChatGPT (κ = 0.720) and Gemini (κ = 0.704). ChatGPT and Gemini successfully answered CDC questions and social media platform questions, but the correct answer rates were insufficient for guideline questions.

摘要

这是第一项评估ChatGPT和Gemini聊天机器人在病毒性肝炎方面的充分性和可靠性的研究。总共从三个不同类别编写了176个问题。第一组包括美国疾病控制与预防中心(CDC)确定的“公众问答(Q&A)”。第二组包括国际指南的强烈推荐。第三组包括社交媒体平台上的常见问题。聊天机器人的答案由两位不同的传染病专家按照1至4分的评分量表进行评估。计算科恩kappa系数以评估评分者间的可靠性。分析了ChatGPT和Gemini生成答案的可重复性和相关性。ChatGPT和Gemini的平均得分(3.55±0.83对3.57±0.89,p = 0.260)和完全正确回答率(71.0%对78.4%,p = 0.111)相似。此外,在对CDC问题部分(90.1%对91.9%,p = 0.752)、指南问题部分(49.4%对61.4%,p = 0.140)以及社交媒体平台问题部分(82.5%对90%,p = 0.335)的亚组分析中,完全正确答案率相似。ChatGPT和Gemini聊天机器人的答案之间存在中等程度的正相关(r = 0.633,p < 0.001)。ChatGPT中问题答案的可重复性率为91.3%,Gemini中为92%(p = 0.710)。根据科恩kappa检验,ChatGPT(κ = 0.720)和Gemini(κ = 0.704)的评分者间一致性都很高。ChatGPT和Gemini成功回答了CDC问题和社交媒体平台问题,但对于指南问题,正确答案率不足。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/592f/11724965/440f91d247e9/41598_2024_83575_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/592f/11724965/dbff60e9887b/41598_2024_83575_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/592f/11724965/08f1cd2ebfa3/41598_2024_83575_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/592f/11724965/440f91d247e9/41598_2024_83575_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/592f/11724965/dbff60e9887b/41598_2024_83575_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/592f/11724965/08f1cd2ebfa3/41598_2024_83575_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/592f/11724965/440f91d247e9/41598_2024_83575_Fig3_HTML.jpg

相似文献

1
Comparison of the performances between ChatGPT and Gemini in answering questions on viral hepatitis.ChatGPT与Gemini在回答病毒性肝炎相关问题时的性能比较。
Sci Rep. 2025 Jan 11;15(1):1712. doi: 10.1038/s41598-024-83575-1.
2
Evaluación de la fiabilidad y legibilidad de las respuestas de los chatbots como recurso de información al paciente para las exploraciones PET-TC más communes.评估聊天机器人回复作为常见PET-CT检查患者信息资源的可靠性和可读性。
Rev Esp Med Nucl Imagen Mol (Engl Ed). 2025 Jan-Feb;44(1):500065. doi: 10.1016/j.remnie.2024.500065. Epub 2024 Sep 28.
3
Performance assessment of ChatGPT 4, ChatGPT 3.5, Gemini Advanced Pro 1.5 and Bard 2.0 to problem solving in pathology in French language.ChatGPT 4、ChatGPT 3.5、Gemini Advanced Pro 1.5和Bard 2.0解决法语病理学问题的性能评估。
Digit Health. 2025 Jan 31;11:20552076241310630. doi: 10.1177/20552076241310630. eCollection 2025 Jan-Dec.
4
Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.人工智能模型在风湿病委员会级问题中的比较性能:评估 Google Gemini 和 ChatGPT-4o。
Clin Rheumatol. 2024 Nov;43(11):3507-3513. doi: 10.1007/s10067-024-07154-5. Epub 2024 Sep 28.
5
Evidence-Based Potential of Generative Artificial Intelligence Large Language Models on Dental Avulsion: ChatGPT Versus Gemini.生成式人工智能大语言模型在牙脱位方面基于证据的潜力:ChatGPT与Gemini对比
Dent Traumatol. 2025 Apr;41(2):178-186. doi: 10.1111/edt.12999. Epub 2024 Nov 2.
6
Can ChatGPT and Gemini justify brain CT referrals? A comparative study with human experts and a custom prediction model.ChatGPT和Gemini能否证明脑部CT转诊的合理性?与人类专家和定制预测模型的比较研究。
Eur Radiol Exp. 2025 Feb 18;9(1):24. doi: 10.1186/s41747-025-00569-y.
7
Comparative analysis of ChatGPT-4o mini, ChatGPT-4o and Gemini Advanced in the treatment of postmenopausal osteoporosis.ChatGPT-4o mini、ChatGPT-4o与Gemini Advanced在绝经后骨质疏松症治疗中的对比分析。
BMC Musculoskelet Disord. 2025 Apr 16;26(1):369. doi: 10.1186/s12891-025-08601-3.
8
Assessing the knowledge of ChatGPT and Google Gemini in answering peripheral artery disease-related questions.评估ChatGPT和谷歌Gemini在回答外周动脉疾病相关问题方面的知识水平。
Vascular. 2025 Jan 21:17085381251315999. doi: 10.1177/17085381251315999.
9
Comparing answers of ChatGPT and Google Gemini to common questions on benign anal conditions.比较ChatGPT和谷歌Gemini对常见肛门良性疾病问题的回答。
Tech Coloproctol. 2025 Jan 26;29(1):57. doi: 10.1007/s10151-024-03096-x.
10
Comparative evaluation of ChatGPT-4, ChatGPT-3.5 and Google Gemini on PCOS assessment and management based on recommendations from the 2023 guideline.基于2023年指南建议对ChatGPT-4、ChatGPT-3.5和谷歌Gemini在多囊卵巢综合征评估与管理方面的比较评估
Endocrine. 2025 Apr;88(1):315-322. doi: 10.1007/s12020-024-04121-7. Epub 2024 Dec 2.

本文引用的文献

1
How Reliable is ChatGPT as a Novel Consultant in Infectious Diseases and Clinical Microbiology?ChatGPT作为传染病和临床微生物学领域的新型顾问有多可靠?
Infect Dis Clin Microbiol. 2024 Feb 16;6(1):55-59. doi: 10.36519/idcm.2024.286. eCollection 2024 Mar.
2
Evaluating ChatGPT ability to answer urinary tract Infection-Related questions.评估 ChatGPT 回答尿路感染相关问题的能力。
Infect Dis Now. 2024 Jun;54(4):104884. doi: 10.1016/j.idnow.2024.104884. Epub 2024 Mar 8.
3
Exploring AI-chatbots' capability to suggest surgical planning in ophthalmology: ChatGPT versus Google Gemini analysis of retinal detachment cases.
探索 AI 聊天机器人在眼科手术规划方面的建议能力:ChatGPT 与 Google Gemini 对视网膜脱离病例的分析比较。
Br J Ophthalmol. 2024 Sep 20;108(10):1457-1469. doi: 10.1136/bjo-2023-325143.
4
Comparison of Large Language Models in Answering Immuno-Oncology Questions: A Cross-Sectional Study.大型语言模型在回答免疫肿瘤学问题中的比较:一项横断面研究。
Oncologist. 2024 May 3;29(5):407-414. doi: 10.1093/oncolo/oyae009.
5
ChatGPT fails challenging the recent ESCMID brain abscess guideline.ChatGPT 在挑战近期 ESCMID 脑脓肿指南方面失败了。
J Neurol. 2024 Apr;271(4):2086-2101. doi: 10.1007/s00415-023-12168-1. Epub 2024 Jan 27.
6
Emerging Pathogens Causing Acute Hepatitis.引发急性肝炎的新出现病原体。
Microorganisms. 2023 Dec 10;11(12):2952. doi: 10.3390/microorganisms11122952.
7
Accuracy and reproducibility of ChatGPT's free version answers about endometriosis.ChatGPT 免费版对子宫内膜异位症回答的准确性和可重复性。
Int J Gynaecol Obstet. 2024 May;165(2):691-695. doi: 10.1002/ijgo.15309. Epub 2023 Dec 18.
8
Consulting the Digital Doctor: Google Versus ChatGPT as Sources of Information on Breast Implant-Associated Anaplastic Large Cell Lymphoma and Breast Implant Illness.咨询数字医生:谷歌与 ChatGPT 在乳房植入物相关间变大细胞淋巴瘤和乳房植入物病信息源方面的比较。
Aesthetic Plast Surg. 2024 Feb;48(4):590-607. doi: 10.1007/s00266-023-03713-4. Epub 2023 Oct 30.
9
Analyzing the Performance of ChatGPT About Osteoporosis.分析ChatGPT在骨质疏松症方面的表现。
Cureus. 2023 Sep 25;15(9):e45890. doi: 10.7759/cureus.45890. eCollection 2023 Sep.
10
Can Chatbot Artificial Intelligence Replace Infectious Diseases Physicians in the Management of Bloodstream Infections? A Prospective Cohort Study.人工智能聊天机器人能否在血流感染管理中取代传染病医生?一项前瞻性队列研究。
Clin Infect Dis. 2024 Apr 10;78(4):825-832. doi: 10.1093/cid/ciad632.