• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生成式人工智能工具在儿科家族性地中海热中的可靠性:来自多中心专家调查的见解。

Reliability of a generative artificial intelligence tool for pediatric familial Mediterranean fever: insights from a multicentre expert survey.

机构信息

Department of Pediatrics, "G. D'Annunzio" University of Chieti-Pescara, Chieti, Italy.

Division of Pediatric Rheumatology, "G. D'Annunzio" University of Chieti-Pescara, Chieti, Italy.

出版信息

Pediatr Rheumatol Online J. 2024 Aug 23;22(1):78. doi: 10.1186/s12969-024-01011-0.

DOI:10.1186/s12969-024-01011-0
PMID:39180115
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11342667/
Abstract

BACKGROUND

Artificial intelligence (AI) has become a popular tool for clinical and research use in the medical field. The aim of this study was to evaluate the accuracy and reliability of a generative AI tool on pediatric familial Mediterranean fever (FMF).

METHODS

Fifteen questions repeated thrice on pediatric FMF were prompted to the popular generative AI tool Microsoft Copilot with Chat-GPT 4.0. Nine pediatric rheumatology experts rated response accuracy with a blinded mechanism using a Likert-like scale with values from 1 to 5.

RESULTS

Median values for overall responses at the initial assessment ranged from 2.00 to 5.00. During the second assessment, median values spanned from 2.00 to 4.00, while for the third assessment, they ranged from 3.00 to 4.00. Intra-rater variability showed poor to moderate agreement (intraclass correlation coefficient range: -0.151 to 0.534). A diminishing level of agreement among experts over time was documented, as highlighted by Krippendorff's alpha coefficient values, ranging from 0.136 (at the first response) to 0.132 (at the second response) to 0.089 (at the third response). Lastly, experts displayed varying levels of trust in AI pre- and post-survey.

CONCLUSIONS

AI has promising implications in pediatric rheumatology, including early diagnosis and management optimization, but challenges persist due to uncertain information reliability and the lack of expert validation. Our survey revealed considerable inaccuracies and incompleteness in AI-generated responses regarding FMF, with poor intra- and extra-rater reliability. Human validation remains crucial in managing AI-generated medical information.

摘要

背景

人工智能(AI)已成为医学领域临床和研究应用的热门工具。本研究旨在评估生成式 AI 工具在儿科家族性地中海热(FMF)中的准确性和可靠性。

方法

向流行的生成式 AI 工具 Microsoft Copilot 与 Chat-GPT 4.0 提出了十五个关于儿科 FMF 的重复三遍的问题。九名儿科风湿病专家使用类似于李克特量表的机制进行盲法评估,对反应准确性进行评分,分值范围为 1 到 5。

结果

初始评估时,整体反应的中位数值范围为 2.00 到 5.00。在第二次评估时,中位数值范围为 2.00 到 4.00,而在第三次评估时,中位数值范围为 3.00 到 4.00。内部评估者的变异性显示出较差到中等的一致性(组内相关系数范围:-0.151 到 0.534)。随着时间的推移,专家之间的一致性水平逐渐降低,正如 Krippendorff 的 alpha 系数值所强调的那样,从第一次响应的 0.136 到第二次响应的 0.132 到第三次响应的 0.089。最后,专家在调查前后对 AI 的信任程度存在差异。

结论

AI 在儿科风湿病学中具有广阔的应用前景,包括早期诊断和管理优化,但由于信息可靠性不确定和缺乏专家验证,仍存在挑战。我们的调查显示,AI 生成的关于 FMF 的反应存在相当大的不准确和不完整,内部和外部评估者的可靠性都较差。在管理 AI 生成的医疗信息时,人工验证仍然至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bc9/11342667/aaf445a5f129/12969_2024_1011_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bc9/11342667/747ef17bc4c3/12969_2024_1011_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bc9/11342667/aaf445a5f129/12969_2024_1011_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bc9/11342667/747ef17bc4c3/12969_2024_1011_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bc9/11342667/aaf445a5f129/12969_2024_1011_Fig2_HTML.jpg

相似文献

1
Reliability of a generative artificial intelligence tool for pediatric familial Mediterranean fever: insights from a multicentre expert survey.生成式人工智能工具在儿科家族性地中海热中的可靠性:来自多中心专家调查的见解。
Pediatr Rheumatol Online J. 2024 Aug 23;22(1):78. doi: 10.1186/s12969-024-01011-0.
2
Pilot Testing of a Tool to Standardize the Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models.用于规范基于人工智能模型生成的健康信息质量评估工具的试点测试。
Cureus. 2023 Nov 24;15(11):e49373. doi: 10.7759/cureus.49373. eCollection 2023 Nov.
3
Validation of the Quality Analysis of Medical Artificial Intelligence (QAMAI) tool: a new tool to assess the quality of health information provided by AI platforms.验证医学人工智能质量分析(QAMAI)工具:一种评估人工智能平台提供的健康信息质量的新工具。
Eur Arch Otorhinolaryngol. 2024 Nov;281(11):6123-6131. doi: 10.1007/s00405-024-08710-0. Epub 2024 May 4.
4
Effectiveness of Generative Artificial Intelligence-Driven Responses to Patient Concerns in Long-Term Opioid Therapy: Cross-Model Assessment.生成式人工智能驱动的对长期阿片类药物治疗中患者担忧的回应的有效性:跨模型评估
Biomedicines. 2025 Mar 5;13(3):636. doi: 10.3390/biomedicines13030636.
5
Assessing the Quality and Reliability of ChatGPT's Responses to Radiotherapy-Related Patient Queries: Comparative Study With GPT-3.5 and GPT-4.评估ChatGPT对放疗相关患者问题回答的质量和可靠性:与GPT-3.5和GPT-4的比较研究
JMIR Cancer. 2025 Apr 16;11:e63677. doi: 10.2196/63677.
6
Evaluation of ChatGPT as a Reliable Source of Medical Information on Prostate Cancer for Patients: Global Comparative Survey of Medical Oncologists and Urologists.评估ChatGPT作为患者前列腺癌医学信息可靠来源的情况:肿瘤内科医生和泌尿科医生的全球比较调查。
Urol Pract. 2025 Mar;12(2):229-240. doi: 10.1097/UPJ.0000000000000740. Epub 2024 Nov 7.
7
Evaluation of Generative Artificial Intelligence Models in Predicting Pediatric Emergency Severity Index Levels.生成式人工智能模型在预测儿科急诊严重程度指数水平中的评估
Pediatr Emerg Care. 2025 Apr 1;41(4):251-255. doi: 10.1097/PEC.0000000000003315. Epub 2025 Jan 7.
8
Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.大型语言模型在新冠肺炎对妊娠影响方面的熟练度、清晰度和客观性与专家知识对比:横断面试点研究
JMIR Form Res. 2025 Feb 5;9:e56126. doi: 10.2196/56126.
9
Accuracy of generative artificial intelligence models in differential diagnoses of familial Mediterranean fever and deficiency of Interleukin-1 receptor antagonist.生成式人工智能模型在家族性地中海热和白细胞介素-1受体拮抗剂缺乏症鉴别诊断中的准确性
J Transl Autoimmun. 2023 Oct 14;7:100213. doi: 10.1016/j.jtauto.2023.100213. eCollection 2023 Dec.
10
Theory of trust and acceptance of artificial intelligence technology (TrAAIT): An instrument to assess clinician trust and acceptance of artificial intelligence.信任和接受人工智能技术理论(TrAAIT):一种评估临床医生对人工智能信任和接受程度的工具。
J Biomed Inform. 2023 Dec;148:104550. doi: 10.1016/j.jbi.2023.104550. Epub 2023 Nov 20.

引用本文的文献

1
Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.评估ChatGPT、Gemini和Perplexity针对强直性脊柱炎最常见问题生成的回答的可读性、质量和可靠性。
PLoS One. 2025 Jun 18;20(6):e0326351. doi: 10.1371/journal.pone.0326351. eCollection 2025.
2
Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: Comparative Study of Claude, GPT, and Gemini.评估大语言模型在肩胛下肌上囊重建术前患者教育中的应用:Claude、GPT和Gemini的比较研究
JMIR Perioper Med. 2025 Jun 12;8:e70047. doi: 10.2196/70047.
3

本文引用的文献

1
: Social Media Platforms as a New Educational Channel for Pediatric Rheumatology.社交媒体平台作为儿科风湿病学的新型教育渠道
J Rheumatol. 2024 Aug 1;51(8):741-743. doi: 10.3899/jrheum.2024-0408.
2
Machine learning and artificial intelligence within pediatric autoimmune diseases: applications, challenges, future perspective.机器学习和人工智能在儿科自身免疫性疾病中的应用、挑战与未来展望。
Expert Rev Clin Immunol. 2024 Oct;20(10):1219-1236. doi: 10.1080/1744666X.2024.2359019. Epub 2024 Jun 14.
3
Quality and Characteristics of Pediatric Rheumatology Content on Social Media: Toward a New Era of Education for Patients and Caregivers?
Is ChatGPT a Reliable Tool for Explaining Medical Terms?
ChatGPT是解释医学术语的可靠工具吗?
Cureus. 2025 Jan 10;17(1):e77258. doi: 10.7759/cureus.77258. eCollection 2025 Jan.
4
Familial Mediterranean fever in children from central-southern Italy: a multicentric retrospective cohort study.意大利中南部儿童家族性地中海热:一项多中心回顾性队列研究。
Clin Rheumatol. 2024 Dec;43(12):3983-3992. doi: 10.1007/s10067-024-07207-9. Epub 2024 Oct 29.
社交媒体上儿科风湿病学内容的质量与特点:迈向患者及护理人员教育的新时代?
J Rheumatol. 2024 Jun 1;51(6):640-643. doi: 10.3899/jrheum.2024-0039.
4
The pyrin inflammasome, a leading actor in pediatric autoinflammatory diseases.pyrin 炎性小体,儿童自身炎症性疾病的主要参与者。
Front Immunol. 2024 Jan 5;14:1341680. doi: 10.3389/fimmu.2023.1341680. eCollection 2023.
5
Establishment and analysis of a novel diagnostic model for systemic juvenile idiopathic arthritis based on machine learning.基于机器学习的系统性幼年特发性关节炎新型诊断模型的建立与分析。
Pediatr Rheumatol Online J. 2024 Jan 19;22(1):18. doi: 10.1186/s12969-023-00949-x.
6
Artificial intelligence for nailfold capillaroscopy analyses - a proof of concept application in juvenile dermatomyositis.用于甲襞毛细血管镜分析的人工智能——在青少年皮肌炎中的概念验证应用
Pediatr Res. 2024 Mar;95(4):981-987. doi: 10.1038/s41390-023-02894-7. Epub 2023 Nov 22.
7
Accuracy of generative artificial intelligence models in differential diagnoses of familial Mediterranean fever and deficiency of Interleukin-1 receptor antagonist.生成式人工智能模型在家族性地中海热和白细胞介素-1受体拮抗剂缺乏症鉴别诊断中的准确性
J Transl Autoimmun. 2023 Oct 14;7:100213. doi: 10.1016/j.jtauto.2023.100213. eCollection 2023 Dec.
8
Decoding Applications of Artificial Intelligence in Rheumatology.人工智能在风湿病学中的解码应用。
Cureus. 2023 Sep 28;15(9):e46164. doi: 10.7759/cureus.46164. eCollection 2023 Sep.
9
Artificial intelligence to analyze magnetic resonance imaging in rheumatology.人工智能分析风湿病的磁共振成像。
Joint Bone Spine. 2024 May;91(3):105651. doi: 10.1016/j.jbspin.2023.105651. Epub 2023 Oct 4.
10
Renal involvement in monogenic autoinflammatory diseases: A narrative review.单基因自身炎症性疾病的肾脏受累:叙述性综述。
Nephrology (Carlton). 2023 Jul;28(7):363-371. doi: 10.1111/nep.14166. Epub 2023 May 4.