• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估大型语言模型(GPT-3.5和GPT-4)的性能以及儿科肾脏病学的准确临床信息。

Assessing the performance of large language models (GPT-3.5 and GPT-4) and accurate clinical information for pediatric nephrology.

作者信息

Sav Nadide Melike

机构信息

Department of Pediatric Nephrology, Duzce University, Duzce, Turkey.

出版信息

Pediatr Nephrol. 2025 Mar 5. doi: 10.1007/s00467-025-06723-3.

DOI:10.1007/s00467-025-06723-3
PMID:40045013
Abstract

BACKGROUND

Artificial intelligence (AI) has emerged as a transformative tool in healthcare, offering significant advancements in providing accurate clinical information. However, the performance and applicability of AI models in specialized fields such as pediatric nephrology remain underexplored. This study is aimed at evaluating the ability of two AI-based language models, GPT-3.5 and GPT-4, to provide accurate and reliable clinical information in pediatric nephrology. The models were evaluated on four criteria: accuracy, scope, patient friendliness, and clinical applicability.

METHODS

Forty pediatric nephrology specialists with ≥ 5 years of experience rated GPT-3.5 and GPT-4 responses to 10 clinical questions using a 1-5 scale via Google Forms. Ethical approval was obtained, and informed consent was secured from all participants.

RESULTS

Both GPT-3.5 and GPT-4 demonstrated comparable performance across all criteria, with no statistically significant differences observed (p > 0.05). GPT-4 exhibited slightly higher mean scores in all parameters, but the differences were negligible (Cohen's d < 0.1 for all criteria). Reliability analysis revealed low internal consistency for both models (Cronbach's alpha ranged between 0.019 and 0.162). Correlation analysis indicated no significant relationship between participants' years of professional experience and their evaluations of GPT-3.5 (correlation coefficients ranged from - 0.026 to 0.074).

CONCLUSIONS

While GPT-3.5 and GPT-4 provided a foundational level of clinical information support, neither model exhibited superior performance in addressing the unique challenges of pediatric nephrology. The findings highlight the need for domain-specific training and integration of updated clinical guidelines to enhance the applicability and reliability of AI models in specialized fields. This study underscores the potential of AI in pediatric nephrology while emphasizing the importance of human oversight and the need for further refinements in AI applications.

摘要

背景

人工智能(AI)已成为医疗保健领域的一种变革性工具,在提供准确临床信息方面取得了重大进展。然而,人工智能模型在儿科肾脏病等专业领域的性能和适用性仍未得到充分探索。本研究旨在评估两种基于人工智能的语言模型GPT-3.5和GPT-4在儿科肾脏病中提供准确可靠临床信息的能力。这些模型根据四个标准进行评估:准确性、范围、对患者的友好程度和临床适用性。

方法

40名具有≥5年经验的儿科肾脏病专家通过谷歌表单使用1-5评分量表对GPT-3.5和GPT-4对10个临床问题的回答进行评分。获得了伦理批准,并征得所有参与者的知情同意。

结果

GPT-3.5和GPT-4在所有标准上均表现出相当的性能,未观察到统计学上的显著差异(p>0.05)。GPT-4在所有参数上的平均得分略高,但差异可忽略不计(所有标准的科恩d<0.1)。可靠性分析显示,两个模型的内部一致性都很低(克朗巴赫α系数在0.019至0.162之间)。相关性分析表明,参与者的专业经验年限与他们对GPT-3.5的评价之间没有显著关系(相关系数范围为-0.026至0.074)。

结论

虽然GPT-3.5和GPT-4提供了一定程度的临床信息支持,但在应对儿科肾脏病的独特挑战方面,这两种模型均未表现出卓越的性能。研究结果凸显了进行特定领域培训以及整合最新临床指南以提高人工智能模型在专业领域的适用性和可靠性的必要性。本研究强调了人工智能在儿科肾脏病中的潜力,同时强调了人工监督的重要性以及人工智能应用进一步优化的必要性。

相似文献

1
Assessing the performance of large language models (GPT-3.5 and GPT-4) and accurate clinical information for pediatric nephrology.评估大型语言模型(GPT-3.5和GPT-4)的性能以及儿科肾脏病学的准确临床信息。
Pediatr Nephrol. 2025 Mar 5. doi: 10.1007/s00467-025-06723-3.
2
Assessing the Accuracy and Reliability of Large Language Models in Psychiatry Using Standardized Multiple-Choice Questions: Cross-Sectional Study.使用标准化多项选择题评估大型语言模型在精神病学中的准确性和可靠性:横断面研究
J Med Internet Res. 2025 May 20;27:e69910. doi: 10.2196/69910.
3
Thyroid Eye Disease and Artificial Intelligence: A Comparative Study of ChatGPT-3.5, ChatGPT-4o, and Gemini in Patient Information Delivery.甲状腺眼病与人工智能:ChatGPT-3.5、ChatGPT-4o和Gemini在患者信息传递方面的比较研究
Ophthalmic Plast Reconstr Surg. 2024 Dec 24. doi: 10.1097/IOP.0000000000002882.
4
Potential of ChatGPT in youth mental health emergency triage: Comparative analysis with clinicians.ChatGPT在青少年心理健康紧急分诊中的潜力:与临床医生的比较分析
PCN Rep. 2025 Jul 15;4(3):e70159. doi: 10.1002/pcn5.70159. eCollection 2025 Sep.
5
Evaluation of ChatGPT-4 as an Online Outpatient Assistant in Puerperal Mastitis Management: Content Analysis of an Observational Study.评估ChatGPT-4作为产褥期乳腺炎管理在线门诊助手的效果:一项观察性研究的内容分析
JMIR Med Inform. 2025 Jul 24;13:e68980. doi: 10.2196/68980.
6
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
7
Development and Validation of a Large Language Model-Powered Chatbot for Neurosurgery: Mixed Methods Study on Enhancing Perioperative Patient Education.用于神经外科手术的基于大语言模型的聊天机器人的开发与验证:关于加强围手术期患者教育的混合方法研究
J Med Internet Res. 2025 Jul 15;27:e74299. doi: 10.2196/74299.
8
Artificial intelligence for detecting keratoconus.人工智能在圆锥角膜检测中的应用。
Cochrane Database Syst Rev. 2023 Nov 15;11(11):CD014911. doi: 10.1002/14651858.CD014911.pub2.
9
AI in Qualitative Health Research Appraisal: Comparative Study.人工智能在定性健康研究评估中的应用:比较研究
JMIR Form Res. 2025 Jul 8;9:e72815. doi: 10.2196/72815.
10
Large Language Models and Empathy: Systematic Review.大语言模型与同理心:系统综述
J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.

本文引用的文献

1
Radiomics and Artificial Intelligence Landscape for [F]FDG PET/CT in Multiple Myeloma.多发性骨髓瘤中[F]FDG PET/CT的放射组学与人工智能全景
Semin Nucl Med. 2025 May;55(3):387-395. doi: 10.1053/j.semnuclmed.2024.11.005. Epub 2024 Dec 13.
2
Performance of Chatgpt in ophthalmology exam; human versus AI.Chatgpt 在眼科考试中的表现;人类与 AI 相比。
Int Ophthalmol. 2024 Nov 6;44(1):413. doi: 10.1007/s10792-024-03353-w.
3
Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.
Gemini人工智能与ChatGPT对比:与眼科住院医师一起对医学知识进行的全面考察
Graefes Arch Clin Exp Ophthalmol. 2025 Feb;263(2):527-536. doi: 10.1007/s00417-024-06625-4. Epub 2024 Sep 15.
4
The accuracy of Gemini, GPT-4, and GPT-4o in ECG analysis: A comparison with cardiologists and emergency medicine specialists.Gemini、GPT-4 和 GPT-4o 在心电图分析中的准确性:与心脏病专家和急诊医学专家的比较。
Am J Emerg Med. 2024 Oct;84:68-73. doi: 10.1016/j.ajem.2024.07.043. Epub 2024 Jul 30.
5
Assessing GPT-4's Performance in Delivering Medical Advice: Comparative Analysis With Human Experts.评估 GPT-4 提供医疗建议的表现:与人类专家的比较分析。
JMIR Med Educ. 2024 Jul 8;10:e51282. doi: 10.2196/51282.
6
Performance of GPT-3.5 and GPT-4 on standardized urology knowledge assessment items in the United States: a descriptive study.GPT-3.5 和 GPT-4 在标准化美国泌尿科知识评估项目中的表现:一项描述性研究。
J Educ Eval Health Prof. 2024;21:17. doi: 10.3352/jeehp.2024.21.17. Epub 2024 Jul 8.
7
Decoding medical jargon: The use of AI language models (ChatGPT-4, BARD, microsoft copilot) in radiology reports.解读医学行话:人工智能语言模型(ChatGPT-4、BARD、microsoft copilot)在放射科报告中的应用。
Patient Educ Couns. 2024 Sep;126:108307. doi: 10.1016/j.pec.2024.108307. Epub 2024 May 3.
8
The Role of AI in Hospitals and Clinics: Transforming Healthcare in the 21st Century.人工智能在医院和诊所中的作用:变革21世纪的医疗保健
Bioengineering (Basel). 2024 Mar 29;11(4):337. doi: 10.3390/bioengineering11040337.
9
Artificial intelligence in preventive cardiology.人工智能在预防心脏病学中的应用。
Prog Cardiovasc Dis. 2024 May-Jun;84:76-89. doi: 10.1016/j.pcad.2024.03.002. Epub 2024 Mar 7.
10
Ophthalmology Optical Coherence Tomography Databases for Artificial Intelligence Algorithm: A Review.眼科人工智能算法光学相干层析成像数据库:综述。
Semin Ophthalmol. 2024 Apr;39(3):193-200. doi: 10.1080/08820538.2024.2308248. Epub 2024 Feb 9.