文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

生成式人工智能工具在儿科家族性地中海热中的可靠性:来自多中心专家调查的见解。

Reliability of a generative artificial intelligence tool for pediatric familial Mediterranean fever: insights from a multicentre expert survey.

机构信息

Department of Pediatrics, "G. D'Annunzio" University of Chieti-Pescara, Chieti, Italy.

Division of Pediatric Rheumatology, "G. D'Annunzio" University of Chieti-Pescara, Chieti, Italy.

出版信息

Pediatr Rheumatol Online J. 2024 Aug 23;22(1):78. doi: 10.1186/s12969-024-01011-0.


DOI:10.1186/s12969-024-01011-0
PMID:39180115
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11342667/
Abstract

BACKGROUND: Artificial intelligence (AI) has become a popular tool for clinical and research use in the medical field. The aim of this study was to evaluate the accuracy and reliability of a generative AI tool on pediatric familial Mediterranean fever (FMF). METHODS: Fifteen questions repeated thrice on pediatric FMF were prompted to the popular generative AI tool Microsoft Copilot with Chat-GPT 4.0. Nine pediatric rheumatology experts rated response accuracy with a blinded mechanism using a Likert-like scale with values from 1 to 5. RESULTS: Median values for overall responses at the initial assessment ranged from 2.00 to 5.00. During the second assessment, median values spanned from 2.00 to 4.00, while for the third assessment, they ranged from 3.00 to 4.00. Intra-rater variability showed poor to moderate agreement (intraclass correlation coefficient range: -0.151 to 0.534). A diminishing level of agreement among experts over time was documented, as highlighted by Krippendorff's alpha coefficient values, ranging from 0.136 (at the first response) to 0.132 (at the second response) to 0.089 (at the third response). Lastly, experts displayed varying levels of trust in AI pre- and post-survey. CONCLUSIONS: AI has promising implications in pediatric rheumatology, including early diagnosis and management optimization, but challenges persist due to uncertain information reliability and the lack of expert validation. Our survey revealed considerable inaccuracies and incompleteness in AI-generated responses regarding FMF, with poor intra- and extra-rater reliability. Human validation remains crucial in managing AI-generated medical information.

摘要

背景:人工智能(AI)已成为医学领域临床和研究应用的热门工具。本研究旨在评估生成式 AI 工具在儿科家族性地中海热(FMF)中的准确性和可靠性。

方法:向流行的生成式 AI 工具 Microsoft Copilot 与 Chat-GPT 4.0 提出了十五个关于儿科 FMF 的重复三遍的问题。九名儿科风湿病专家使用类似于李克特量表的机制进行盲法评估,对反应准确性进行评分,分值范围为 1 到 5。

结果:初始评估时,整体反应的中位数值范围为 2.00 到 5.00。在第二次评估时,中位数值范围为 2.00 到 4.00,而在第三次评估时,中位数值范围为 3.00 到 4.00。内部评估者的变异性显示出较差到中等的一致性(组内相关系数范围:-0.151 到 0.534)。随着时间的推移,专家之间的一致性水平逐渐降低,正如 Krippendorff 的 alpha 系数值所强调的那样,从第一次响应的 0.136 到第二次响应的 0.132 到第三次响应的 0.089。最后,专家在调查前后对 AI 的信任程度存在差异。

结论:AI 在儿科风湿病学中具有广阔的应用前景,包括早期诊断和管理优化,但由于信息可靠性不确定和缺乏专家验证,仍存在挑战。我们的调查显示,AI 生成的关于 FMF 的反应存在相当大的不准确和不完整,内部和外部评估者的可靠性都较差。在管理 AI 生成的医疗信息时,人工验证仍然至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bc9/11342667/aaf445a5f129/12969_2024_1011_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bc9/11342667/747ef17bc4c3/12969_2024_1011_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bc9/11342667/aaf445a5f129/12969_2024_1011_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bc9/11342667/747ef17bc4c3/12969_2024_1011_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bc9/11342667/aaf445a5f129/12969_2024_1011_Fig2_HTML.jpg

相似文献

[1]
Reliability of a generative artificial intelligence tool for pediatric familial Mediterranean fever: insights from a multicentre expert survey.

Pediatr Rheumatol Online J. 2024-8-23

[2]
Pilot Testing of a Tool to Standardize the Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models.

Cureus. 2023-11-24

[3]
Validation of the Quality Analysis of Medical Artificial Intelligence (QAMAI) tool: a new tool to assess the quality of health information provided by AI platforms.

Eur Arch Otorhinolaryngol. 2024-11

[4]
Effectiveness of Generative Artificial Intelligence-Driven Responses to Patient Concerns in Long-Term Opioid Therapy: Cross-Model Assessment.

Biomedicines. 2025-3-5

[5]
Assessing the Quality and Reliability of ChatGPT's Responses to Radiotherapy-Related Patient Queries: Comparative Study With GPT-3.5 and GPT-4.

JMIR Cancer. 2025-4-16

[6]
Evaluation of ChatGPT as a Reliable Source of Medical Information on Prostate Cancer for Patients: Global Comparative Survey of Medical Oncologists and Urologists.

Urol Pract. 2025-3

[7]
Evaluation of Generative Artificial Intelligence Models in Predicting Pediatric Emergency Severity Index Levels.

Pediatr Emerg Care. 2025-4-1

[8]
Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.

JMIR Form Res. 2025-2-5

[9]
Accuracy of generative artificial intelligence models in differential diagnoses of familial Mediterranean fever and deficiency of Interleukin-1 receptor antagonist.

J Transl Autoimmun. 2023-10-14

[10]
Theory of trust and acceptance of artificial intelligence technology (TrAAIT): An instrument to assess clinician trust and acceptance of artificial intelligence.

J Biomed Inform. 2023-12

引用本文的文献

[1]
Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.

PLoS One. 2025-6-18

[2]
Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: Comparative Study of Claude, GPT, and Gemini.

JMIR Perioper Med. 2025-6-12

[3]
Is ChatGPT a Reliable Tool for Explaining Medical Terms?

Cureus. 2025-1-10

[4]
Familial Mediterranean fever in children from central-southern Italy: a multicentric retrospective cohort study.

Clin Rheumatol. 2024-12

本文引用的文献

[1]
: Social Media Platforms as a New Educational Channel for Pediatric Rheumatology.

J Rheumatol. 2024-8-1

[2]
Machine learning and artificial intelligence within pediatric autoimmune diseases: applications, challenges, future perspective.

Expert Rev Clin Immunol. 2024-10

[3]
Quality and Characteristics of Pediatric Rheumatology Content on Social Media: Toward a New Era of Education for Patients and Caregivers?

J Rheumatol. 2024-6-1

[4]
The pyrin inflammasome, a leading actor in pediatric autoinflammatory diseases.

Front Immunol. 2023

[5]
Establishment and analysis of a novel diagnostic model for systemic juvenile idiopathic arthritis based on machine learning.

Pediatr Rheumatol Online J. 2024-1-19

[6]
Artificial intelligence for nailfold capillaroscopy analyses - a proof of concept application in juvenile dermatomyositis.

Pediatr Res. 2024-3

[7]
Accuracy of generative artificial intelligence models in differential diagnoses of familial Mediterranean fever and deficiency of Interleukin-1 receptor antagonist.

J Transl Autoimmun. 2023-10-14

[8]
Decoding Applications of Artificial Intelligence in Rheumatology.

Cureus. 2023-9-28

[9]
Artificial intelligence to analyze magnetic resonance imaging in rheumatology.

Joint Bone Spine. 2024-5

[10]
Renal involvement in monogenic autoinflammatory diseases: A narrative review.

Nephrology (Carlton). 2023-7

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索