儿科肾脏病学领域的基础知识及其在 ChatGPT-4“全能”和 Gemini 1.5 Flash 特定培训后的增强。

Basal knowledge in the field of pediatric nephrology and its enhancement following specific training of ChatGPT-4 "omni" and Gemini 1.5 Flash.

机构信息

Department of Woman, Child and of General and Specialized Surgery, Università Degli Studi Della Campania "Luigi Vanvitelli", Via Luigi De Crecchio 2, 80138, Naples, Italy.

出版信息

Pediatr Nephrol. 2025 Jan;40(1):151-157. doi: 10.1007/s00467-024-06486-3. Epub 2024 Aug 16.

DOI:10.1007/s00467-024-06486-3

PMID:39150524

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11584465/

Abstract

BACKGROUND

We aimed to evaluate the baseline performance and improvement of ChatGPT-4 "omni" (ChatGPT-4o) and Gemini 1.5 Flash (Gemini 1.5) in answering multiple-choice questions related to pediatric nephrology after specific training.

METHODS

Using questions from the "Educational Review" articles published by Pediatric Nephrology between January 2014 and April 2024, the models were tested both before and after specific training with Portable Data Format (PDF) and text (TXT) file formats of the Educational Review articles removing the last page containing the correct answers using a Python script. The number of correct answers was recorded.

RESULTS

Before training, ChatGPT-4o correctly answered 75.2% of the 1395 questions, outperforming Gemini 1.5, which answered 64.9% correctly (p < 0.001). After training with PDF files, ChatGPT-4o's accuracy increased to 77.8%, while Gemini 1.5 improved significantly to 84.7% (p < 0.001). Training with TXT files showed similar results, with ChatGPT-4o maintaining 77.8% accuracy and Gemini 1.5 further improving to 87.6% (p < 0.001).

CONCLUSIONS

The study highlights that while ChatGPT-4o has strong baseline performance, specific training does not significantly enhance its accuracy. Conversely, Gemini 1.5, despite its lower initial performance, shows substantial improvement with training, particularly with TXT files. These findings suggest Gemini 1.5's superior ability to store and retrieve information, making it potentially more effective in clinical applications, albeit with a dependency on additional data for optimal performance.

摘要

背景

我们旨在评估 ChatGPT-4“全能”（ChatGPT-4o）和 Gemini 1.5 Flash（Gemini 1.5）在接受特定培训后回答小儿肾病学相关多项选择题的基线表现和提高程度。

方法

使用发表于 2014 年 1 月至 2024 年 4 月的《儿科肾脏病学》“教育评论”文章中的问题，使用 Python 脚本从 PDF 和 TXT 格式的教育评论文章中去除最后一页包含正确答案的部分，对模型进行了预培训和特定培训测试。记录正确答案的数量。

结果

在培训前，ChatGPT-4o 正确回答了 1395 个问题中的 75.2%，优于正确回答 64.9%的 Gemini 1.5（p<0.001）。使用 PDF 文件进行培训后，ChatGPT-4o 的准确率提高到 77.8%，而 Gemini 1.5 则显著提高到 84.7%（p<0.001）。使用 TXT 文件进行培训的结果相似，ChatGPT-4o 的准确率保持在 77.8%，而 Gemini 1.5 进一步提高到 87.6%（p<0.001）。

结论

研究表明，虽然 ChatGPT-4o 具有强大的基线表现，但特定培训并不能显著提高其准确性。相比之下，尽管 Gemini 1.5 的初始表现较低，但经过培训后有了显著的提高，尤其是使用 TXT 文件时。这些发现表明 Gemini 1.5 具有存储和检索信息的卓越能力，使其在临床应用中具有潜在的优势，尽管它需要额外的数据才能实现最佳性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2435/11584465/eca901ca5625/467_2024_6486_Figa_HTML.jpg

相似文献

Basal knowledge in the field of pediatric nephrology and its enhancement following specific training of ChatGPT-4 "omni" and Gemini 1.5 Flash.儿科肾脏病学领域的基础知识及其在 ChatGPT-4“全能”和 Gemini 1.5 Flash 特定培训后的增强。

Pediatr Nephrol. 2025 Jan;40(1):151-157. doi: 10.1007/s00467-024-06486-3. Epub 2024 Aug 16.

Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.人工智能模型在风湿病委员会级问题中的比较性能：评估 Google Gemini 和 ChatGPT-4o。

Clin Rheumatol. 2024 Nov;43(11):3507-3513. doi: 10.1007/s10067-024-07154-5. Epub 2024 Sep 28.

Comparative Analysis of ChatGPT-4o and Gemini Advanced Performance on Diagnostic Radiology In-Training Exams.ChatGPT-4o与Gemini在放射诊断学培训考试中的性能对比分析

Cureus. 2025 Mar 20;17(3):e80874. doi: 10.7759/cureus.80874. eCollection 2025 Mar.

Evaluating ChatGPT and Google Gemini Performance and Implications in Turkish Dental Education.评估ChatGPT和谷歌Gemini在土耳其牙科教育中的性能及影响

Cureus. 2025 Jan 11;17(1):e77292. doi: 10.7759/cureus.77292. eCollection 2025 Jan.

Comparative analysis of ChatGPT-4o mini, ChatGPT-4o and Gemini Advanced in the treatment of postmenopausal osteoporosis.ChatGPT-4o mini、ChatGPT-4o与Gemini Advanced在绝经后骨质疏松症治疗中的对比分析。

BMC Musculoskelet Disord. 2025 Apr 16;26(1):369. doi: 10.1186/s12891-025-08601-3.

ChatGPT-4 Omni Performance in USMLE Disciplines and Clinical Skills: Comparative Analysis.ChatGPT-4 在 USMLE 学科和临床技能中的全能表现：比较分析。

JMIR Med Educ. 2024 Nov 6;10:e63430. doi: 10.2196/63430.

Accuracy and quality of ChatGPT-4o and Google Gemini performance on image-based neurosurgery board questions.ChatGPT-4o和谷歌Gemini在基于图像的神经外科委员会问题上的表现准确性和质量。

Neurosurg Rev. 2025 Mar 25;48(1):320. doi: 10.1007/s10143-025-03472-7.

Artificial Intelligence and Gynecological Oncology: A Comparative Study of ChatGPT Omni and Gemini Pro across Repeated Intervals with Case-Scenario and Open-Ended Queries.人工智能与妇科肿瘤学：ChatGPT Omni和Gemini Pro在重复时间段内针对病例场景和开放式问题的比较研究

Oncol Res Treat. 2025;48(6):325-331. doi: 10.1159/000545231. Epub 2025 Mar 12.

Evaluating text and visual diagnostic capabilities of large language models on questions related to the Breast Imaging Reporting and Data System Atlas 5 edition.评估大语言模型在与《乳腺影像报告和数据系统》第5版相关问题上的文本和视觉诊断能力。

Diagn Interv Radiol. 2025 Mar 3;31(2):111-129. doi: 10.4274/dir.2024.242876. Epub 2024 Sep 9.

A Comparative Analysis of Artificial Intelligence Platforms: ChatGPT-4o and Google Gemini in Answering Questions About Birth Control Methods.人工智能平台的比较分析：ChatGPT-4o与谷歌Gemini在回答避孕方法相关问题方面的表现

Cureus. 2025 Jan 1;17(1):e76745. doi: 10.7759/cureus.76745. eCollection 2025 Jan.

引用本文的文献

Performance evaluation of large language models in pediatric nephrology clinical decision support: a comprehensive assessment.大语言模型在儿科肾脏病临床决策支持中的性能评估：一项综合评估

Pediatr Nephrol. 2025 Jun 3. doi: 10.1007/s00467-025-06819-w.

Comparison of medical history documentation efficiency and quality based on GPT-4o: a study on the comparison between residents and artificial intelligence.基于GPT-4o的病史记录效率与质量比较：住院医师与人工智能的比较研究

Front Med (Lausanne). 2025 May 14;12:1545730. doi: 10.3389/fmed.2025.1545730. eCollection 2025.

Kidney Damage in Pediatric Obesity: Insights from an Emerging Perspective.小儿肥胖中的肾损伤：新视角下的见解

J Clin Med. 2024 Nov 21;13(23):7025. doi: 10.3390/jcm13237025.

Performance of large language artificial intelligence models on solving restorative dentistry and endodontics student assessments.大型语言人工智能模型在解决修复牙科和牙髓学生评估方面的性能。

Clin Oral Investig. 2024 Oct 7;28(11):575. doi: 10.1007/s00784-024-05968-w.

本文引用的文献

A study of generative large language model for medical research and healthcare.一项关于用于医学研究和医疗保健的生成式大语言模型的研究。

NPJ Digit Med. 2023 Nov 16;6(1):210. doi: 10.1038/s41746-023-00958-w.

Embracing Large Language Models for Medical Applications: Opportunities and Challenges.拥抱用于医学应用的大语言模型：机遇与挑战。

Cureus. 2023 May 21;15(5):e39305. doi: 10.7759/cureus.39305. eCollection 2023 May.

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现：使用大语言模型进行人工智能辅助医学教育的潜力。

PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

儿科肾脏病学领域的基础知识及其在 ChatGPT-4“全能”和 Gemini 1.5 Flash 特定培训后的增强。

Basal knowledge in the field of pediatric nephrology and its enhancement following specific training of ChatGPT-4 "omni" and Gemini 1.5 Flash.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献