Suppr超能文献

ChatGPT与Gemini在回答病毒性肝炎相关问题时的性能比较。

Comparison of the performances between ChatGPT and Gemini in answering questions on viral hepatitis.

作者信息

Sahin Ozdemir Meryem, Ozdemir Yusuf Emre

机构信息

Department of Infectious Diseases and Clinical Microbiology, Basaksehir Cam and Sakura City Hospital, Istanbul, 34480, Turkey.

Department of Infectious Diseases and Clinical Microbiology, Bakirkoy Dr Sadi Konuk Training and Research Hospital, Istanbul, 34140, Turkey.

出版信息

Sci Rep. 2025 Jan 11;15(1):1712. doi: 10.1038/s41598-024-83575-1.

Abstract

This is the first study to evaluate the adequacy and reliability of the ChatGPT and Gemini chatbots on viral hepatitis. A total of 176 questions were composed from three different categories. The first group includes "questions and answers (Q&As) for the public" determined by the Centers for Disease Control and Prevention (CDC). The second group includes strong recommendations of international guidelines. The third group includes frequently asked questions on social media platforms. The answers of the chatbots were evaluated by two different infectious diseases specialists on a scoring scale from 1 to 4. Cohen's kappa coefficient was calculated to assess inter-rater reliability. The reproducibility and correlation of answers generated by ChatGPT and Gemini were analyzed. ChatGPT and Gemini's mean scores (3.55 ± 0.83 vs. 3.57 ± 0.89, p = 0.260) and completely correct response rates (71.0% vs. 78.4%, p = 0.111) were similar. In addition, in subgroup analyses with the CDC questions Sect. (90.1% vs. 91.9%, p = 0.752), the guideline questions Sect. (49.4% vs. 61.4%, p = 0.140), and the social media platform questions Sect. (82.5% vs. 90%, p = 0.335), the completely correct answers rates were similar. There was a moderate positive correlation between ChatGPT and Gemini chatbots' answers (r = 0.633, p < 0.001). Reproducibility rates of answers to questions were 91.3% in ChatGPT and 92% in Gemini (p = 0.710). According to Cohen's kappa test, there was a substantial inter-rater agreement for both ChatGPT (κ = 0.720) and Gemini (κ = 0.704). ChatGPT and Gemini successfully answered CDC questions and social media platform questions, but the correct answer rates were insufficient for guideline questions.

摘要

这是第一项评估ChatGPT和Gemini聊天机器人在病毒性肝炎方面的充分性和可靠性的研究。总共从三个不同类别编写了176个问题。第一组包括美国疾病控制与预防中心(CDC)确定的“公众问答(Q&A)”。第二组包括国际指南的强烈推荐。第三组包括社交媒体平台上的常见问题。聊天机器人的答案由两位不同的传染病专家按照1至4分的评分量表进行评估。计算科恩kappa系数以评估评分者间的可靠性。分析了ChatGPT和Gemini生成答案的可重复性和相关性。ChatGPT和Gemini的平均得分(3.55±0.83对3.57±0.89,p = 0.260)和完全正确回答率(71.0%对78.4%,p = 0.111)相似。此外,在对CDC问题部分(90.1%对91.9%,p = 0.752)、指南问题部分(49.4%对61.4%,p = 0.140)以及社交媒体平台问题部分(82.5%对90%,p = 0.335)的亚组分析中,完全正确答案率相似。ChatGPT和Gemini聊天机器人的答案之间存在中等程度的正相关(r = 0.633,p < 0.001)。ChatGPT中问题答案的可重复性率为91.3%,Gemini中为92%(p = 0.710)。根据科恩kappa检验,ChatGPT(κ = 0.720)和Gemini(κ = 0.704)的评分者间一致性都很高。ChatGPT和Gemini成功回答了CDC问题和社交媒体平台问题,但对于指南问题,正确答案率不足。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/592f/11724965/dbff60e9887b/41598_2024_83575_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验