针对乳糜泻基本常见问题，对ChatGPT准确性和可靠性的专家评估。

Expert evaluation of ChatGPT accuracy and reliability for basic celiac disease frequently asked questions.

作者信息

Mahmoudi Ghehsareh Mohadeseh, Asri Nastaran, Azizmohammad Looha Mehdi, Sadeghi Amir, Ciacci Carolina, Rostami-Nejad Mohammad

机构信息

Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

Celiac Disease and Gluten Related Disorders Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

出版信息

Sci Rep. 2025 Aug 14;15(1):29871. doi: 10.1038/s41598-025-15898-6.

DOI:10.1038/s41598-025-15898-6

PMID:40813612

Abstract

Artificial Intelligence's (AI) role in providing information on Celiac Disease (CD) remains understudied. This study aimed to evaluate the accuracy and reliability of ChatGPT-3.5 in generating responses to 20 basic CD-related queries. This study assessed ChatGPT-3.5, the dominant publicly accessible version during the study period, to establish a benchmark for AI-assisted CD education. The accuracy of ChatGPT's responses to twenty frequently asked questions (FAQs) was assessed by two independent experts using a Likert scale, followed by categorization based on CD management domains. Inter-rater reliability (agreement between experts) was determined through cross-tabulation, Cohen's kappa, and Wilcoxon signed-rank tests. Intra-rater reliability (agreement within the same expert) was evaluated using the Friedman test with post hoc comparisons. ChatGPT demonstrated high accuracy in responding to CD FAQs, with expert ratings predominantly ranging from 4 to 5. While overall performance was strong, responses to management strategies excelled compared to those related to disease etiology. Inter-rater reliability analysis revealed moderate agreement between the two experts in evaluating ChatGPT's responses (κ = 0.22, p-value = 0.026). Although both experts consistently assigned high scores across different CD management categories, subtle discrepancies emerged in specific instances. Intra-rater reliability analysis indicated high consistency in scoring for one expert (F=0.113), while the other exhibited some variability (F<0.001). ChatGPT exhibits potential as a reliable source of information for CD patients, particularly in the domain of disease management.

摘要

人工智能（AI）在提供乳糜泻（CD）相关信息方面的作用仍未得到充分研究。本研究旨在评估ChatGPT-3.5对20个基本的CD相关问题生成回答的准确性和可靠性。本研究评估了ChatGPT-3.5（研究期间占主导地位的公开可用版本），以建立人工智能辅助CD教育的基准。两位独立专家使用李克特量表评估ChatGPT对20个常见问题（FAQ）的回答准确性，然后根据CD管理领域进行分类。通过交叉制表、科恩kappa检验和威尔科克森符号秩检验确定评分者间信度（专家之间的一致性）。使用Friedman检验及事后比较评估评分者内信度（同一位专家内部的一致性）。ChatGPT在回答CD常见问题方面表现出较高的准确性，专家评分主要在4到5分之间。虽然总体表现强劲，但与疾病病因相关的回答相比，对管理策略的回答更为出色。评分者间信度分析显示，两位专家在评估ChatGPT的回答时一致性中等（κ = 0.22，p值 = 0.026）。尽管两位专家在不同的CD管理类别中始终给出高分，但在特定情况下仍出现了细微差异。评分者内信度分析表明，一位专家的评分具有高度一致性（F = 0.113），而另一位专家则表现出一定的变异性（F < 0.001）。ChatGPT有潜力成为CD患者可靠的信息来源，尤其是在疾病管理领域。

相似文献

Expert evaluation of ChatGPT accuracy and reliability for basic celiac disease frequently asked questions.

Sci Rep. 2025 Aug 14;15(1):29871. doi: 10.1038/s41598-025-15898-6.

Evaluation of ChatGPT-4 as an Online Outpatient Assistant in Puerperal Mastitis Management: Content Analysis of an Observational Study.

JMIR Med Inform. 2025 Jul 24;13:e68980. doi: 10.2196/68980.

Evaluating the validity and consistency of artificial intelligence chatbots in responding to patients' frequently asked questions in prosthodontics.

J Prosthet Dent. 2025 Apr 7. doi: 10.1016/j.prosdent.2025.03.009.

Using Artificial Intelligence ChatGPT to Access Medical Information About Chemical Eye Injuries: Comparative Study.

JMIR Form Res. 2025 Aug 13;9:e73642. doi: 10.2196/73642.

Evaluating the novel role of ChatGPT-4 in addressing corneal ulcer queries: An AI-powered insight.

Eur J Ophthalmol. 2025 Apr 28:11206721251337290. doi: 10.1177/11206721251337290.

Assessing ChatGPT's Educational Potential in Lung Cancer Radiotherapy From Clinician and Patient Perspectives: Content Quality and Readability Analysis.

JMIR Cancer. 2025 Aug 13;11:e69783. doi: 10.2196/69783.

Pharmacy meets AI: Effect of a drug information activity on student perceptions of generative artificial intelligence.

Curr Pharm Teach Learn. 2025 Jul 7;17(10):102439. doi: 10.1016/j.cptl.2025.102439.

Potential of ChatGPT in youth mental health emergency triage: Comparative analysis with clinicians.

PCN Rep. 2025 Jul 15;4(3):e70159. doi: 10.1002/pcn5.70159. eCollection 2025 Sep.

Thyroid Eye Disease and Artificial Intelligence: A Comparative Study of ChatGPT-3.5, ChatGPT-4o, and Gemini in Patient Information Delivery.

Ophthalmic Plast Reconstr Surg. 2024 Dec 24. doi: 10.1097/IOP.0000000000002882.

Evaluating ChatGPT's Utility in Biologic Therapy for Systemic Lupus Erythematosus: Comparative Study of ChatGPT and Google Web Search.

JMIR Form Res. 2025 Aug 28;9:e76458. doi: 10.2196/76458.

本文引用的文献

Evaluation of ChatGPT's performance in providing treatment recommendations for pediatric diseases.

Pediatr Discov. 2023 Nov 20;1(3):e42. doi: 10.1002/pdi3.42. eCollection 2023 Dec.

Structural Basis of Directional Switching by the Bacterial Flagellum.

Res Sq. 2023 Oct 23:rs.3.rs-3417165. doi: 10.21203/rs.3.rs-3417165/v1.

Evaluation of online chat-based artificial intelligence responses about inflammatory bowel disease and diet.

Eur J Gastroenterol Hepatol. 2024 Sep 1;36(9):1109-1112. doi: 10.1097/MEG.0000000000002815. Epub 2024 Jul 8.

Toward Clinical Generative AI: Conceptual Framework.

JMIR AI. 2024 Jun 7;3:e55957. doi: 10.2196/55957.

The correlation between fecal microbiota profiles and intracellular junction genes expression in young Iranian patients with celiac disease.

Tissue Barriers. 2025 Jan 2;13(1):2347766. doi: 10.1080/21688370.2024.2347766. Epub 2024 May 2.

Contribution of ChatGPT in Parkinson's Disease Detection.

Nucl Med Mol Imaging. 2024 May;58(3):101-103. doi: 10.1007/s13139-024-00857-2. Epub 2024 Mar 15.

Evaluating the accuracy and relevance of ChatGPT responses to frequently asked questions regarding total knee replacement.

Knee Surg Relat Res. 2024 Apr 2;36(1):15. doi: 10.1186/s43019-024-00218-5.

Effects of ChatGPT's AI capabilities and human-like traits on spreading information in work environments.

Sci Rep. 2024 Apr 2;14(1):7806. doi: 10.1038/s41598-024-57977-0.

Accuracy of Information given by ChatGPT for Patients with Inflammatory Bowel Disease in Relation to ECCO Guidelines.

J Crohns Colitis. 2024 Aug 14;18(8):1215-1221. doi: 10.1093/ecco-jcc/jjae040.

Screening/diagnosis of pediatric endocrine disorders through the artificial intelligence model in different language settings.

Eur J Pediatr. 2024 Jun;183(6):2655-2661. doi: 10.1007/s00431-024-05527-1. Epub 2024 Mar 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

针对乳糜泻基本常见问题，对ChatGPT准确性和可靠性的专家评估。

Expert evaluation of ChatGPT accuracy and reliability for basic celiac disease frequently asked questions.

作者信息

Mahmoudi Ghehsareh Mohadeseh, Asri Nastaran, Azizmohammad Looha Mehdi, Sadeghi Amir, Ciacci Carolina, Rostami-Nejad Mohammad

机构信息

Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

Celiac Disease and Gluten Related Disorders Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

出版信息

Sci Rep. 2025 Aug 14;15(1):29871. doi: 10.1038/s41598-025-15898-6.

DOI:10.1038/s41598-025-15898-6

PMID:40813612

Abstract

摘要

针对乳糜泻基本常见问题，对ChatGPT准确性和可靠性的专家评估。

Expert evaluation of ChatGPT accuracy and reliability for basic celiac disease frequently asked questions.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

针对乳糜泻基本常见问题，对ChatGPT准确性和可靠性的专家评估。

Expert evaluation of ChatGPT accuracy and reliability for basic celiac disease frequently asked questions.

作者信息

机构信息

出版信息

相似文献

本文引用的文献