文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

针对乳糜泻基本常见问题,对ChatGPT准确性和可靠性的专家评估。

Expert evaluation of ChatGPT accuracy and reliability for basic celiac disease frequently asked questions.

作者信息

Mahmoudi Ghehsareh Mohadeseh, Asri Nastaran, Azizmohammad Looha Mehdi, Sadeghi Amir, Ciacci Carolina, Rostami-Nejad Mohammad

机构信息

Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

Celiac Disease and Gluten Related Disorders Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

出版信息

Sci Rep. 2025 Aug 14;15(1):29871. doi: 10.1038/s41598-025-15898-6.


DOI:10.1038/s41598-025-15898-6
PMID:40813612
Abstract

Artificial Intelligence's (AI) role in providing information on Celiac Disease (CD) remains understudied. This study aimed to evaluate the accuracy and reliability of ChatGPT-3.5 in generating responses to 20 basic CD-related queries. This study assessed ChatGPT-3.5, the dominant publicly accessible version during the study period, to establish a benchmark for AI-assisted CD education. The accuracy of ChatGPT's responses to twenty frequently asked questions (FAQs) was assessed by two independent experts using a Likert scale, followed by categorization based on CD management domains. Inter-rater reliability (agreement between experts) was determined through cross-tabulation, Cohen's kappa, and Wilcoxon signed-rank tests. Intra-rater reliability (agreement within the same expert) was evaluated using the Friedman test with post hoc comparisons. ChatGPT demonstrated high accuracy in responding to CD FAQs, with expert ratings predominantly ranging from 4 to 5. While overall performance was strong, responses to management strategies excelled compared to those related to disease etiology. Inter-rater reliability analysis revealed moderate agreement between the two experts in evaluating ChatGPT's responses (κ = 0.22, p-value = 0.026). Although both experts consistently assigned high scores across different CD management categories, subtle discrepancies emerged in specific instances. Intra-rater reliability analysis indicated high consistency in scoring for one expert (F=0.113), while the other exhibited some variability (F<0.001). ChatGPT exhibits potential as a reliable source of information for CD patients, particularly in the domain of disease management.

摘要

人工智能(AI)在提供乳糜泻(CD)相关信息方面的作用仍未得到充分研究。本研究旨在评估ChatGPT-3.5对20个基本的CD相关问题生成回答的准确性和可靠性。本研究评估了ChatGPT-3.5(研究期间占主导地位的公开可用版本),以建立人工智能辅助CD教育的基准。两位独立专家使用李克特量表评估ChatGPT对20个常见问题(FAQ)的回答准确性,然后根据CD管理领域进行分类。通过交叉制表、科恩kappa检验和威尔科克森符号秩检验确定评分者间信度(专家之间的一致性)。使用Friedman检验及事后比较评估评分者内信度(同一位专家内部的一致性)。ChatGPT在回答CD常见问题方面表现出较高的准确性,专家评分主要在4到5分之间。虽然总体表现强劲,但与疾病病因相关的回答相比,对管理策略的回答更为出色。评分者间信度分析显示,两位专家在评估ChatGPT的回答时一致性中等(κ = 0.22,p值 = 0.026)。尽管两位专家在不同的CD管理类别中始终给出高分,但在特定情况下仍出现了细微差异。评分者内信度分析表明,一位专家的评分具有高度一致性(F = 0.113),而另一位专家则表现出一定的变异性(F < 0.001)。ChatGPT有潜力成为CD患者可靠的信息来源,尤其是在疾病管理领域。

相似文献

[1]
Expert evaluation of ChatGPT accuracy and reliability for basic celiac disease frequently asked questions.

Sci Rep. 2025-8-14

[2]
Evaluation of ChatGPT-4 as an Online Outpatient Assistant in Puerperal Mastitis Management: Content Analysis of an Observational Study.

JMIR Med Inform. 2025-7-24

[3]
Evaluating the validity and consistency of artificial intelligence chatbots in responding to patients' frequently asked questions in prosthodontics.

J Prosthet Dent. 2025-4-7

[4]
Using Artificial Intelligence ChatGPT to Access Medical Information About Chemical Eye Injuries: Comparative Study.

JMIR Form Res. 2025-8-13

[5]
Evaluating the novel role of ChatGPT-4 in addressing corneal ulcer queries: An AI-powered insight.

Eur J Ophthalmol. 2025-4-28

[6]
Assessing ChatGPT's Educational Potential in Lung Cancer Radiotherapy From Clinician and Patient Perspectives: Content Quality and Readability Analysis.

JMIR Cancer. 2025-8-13

[7]
Pharmacy meets AI: Effect of a drug information activity on student perceptions of generative artificial intelligence.

Curr Pharm Teach Learn. 2025-7-7

[8]
Potential of ChatGPT in youth mental health emergency triage: Comparative analysis with clinicians.

PCN Rep. 2025-7-15

[9]
Thyroid Eye Disease and Artificial Intelligence: A Comparative Study of ChatGPT-3.5, ChatGPT-4o, and Gemini in Patient Information Delivery.

Ophthalmic Plast Reconstr Surg. 2024-12-24

[10]
Evaluating ChatGPT's Utility in Biologic Therapy for Systemic Lupus Erythematosus: Comparative Study of ChatGPT and Google Web Search.

JMIR Form Res. 2025-8-28

本文引用的文献

[1]
Evaluation of ChatGPT's performance in providing treatment recommendations for pediatric diseases.

Pediatr Discov. 2023-11-20

[2]
Structural Basis of Directional Switching by the Bacterial Flagellum.

Res Sq. 2023-10-23

[3]
Evaluation of online chat-based artificial intelligence responses about inflammatory bowel disease and diet.

Eur J Gastroenterol Hepatol. 2024-9-1

[4]
Toward Clinical Generative AI: Conceptual Framework.

JMIR AI. 2024-6-7

[5]
The correlation between fecal microbiota profiles and intracellular junction genes expression in young Iranian patients with celiac disease.

Tissue Barriers. 2025-1-2

[6]
Contribution of ChatGPT in Parkinson's Disease Detection.

Nucl Med Mol Imaging. 2024-5

[7]
Evaluating the accuracy and relevance of ChatGPT responses to frequently asked questions regarding total knee replacement.

Knee Surg Relat Res. 2024-4-2

[8]
Effects of ChatGPT's AI capabilities and human-like traits on spreading information in work environments.

Sci Rep. 2024-4-2

[9]
Accuracy of Information given by ChatGPT for Patients with Inflammatory Bowel Disease in Relation to ECCO Guidelines.

J Crohns Colitis. 2024-8-14

[10]
Screening/diagnosis of pediatric endocrine disorders through the artificial intelligence model in different language settings.

Eur J Pediatr. 2024-6

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索