Mahmoudi Ghehsareh Mohadeseh, Asri Nastaran, Azizmohammad Looha Mehdi, Sadeghi Amir, Ciacci Carolina, Rostami-Nejad Mohammad
Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
Celiac Disease and Gluten Related Disorders Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
Sci Rep. 2025 Aug 14;15(1):29871. doi: 10.1038/s41598-025-15898-6.
Artificial Intelligence's (AI) role in providing information on Celiac Disease (CD) remains understudied. This study aimed to evaluate the accuracy and reliability of ChatGPT-3.5 in generating responses to 20 basic CD-related queries. This study assessed ChatGPT-3.5, the dominant publicly accessible version during the study period, to establish a benchmark for AI-assisted CD education. The accuracy of ChatGPT's responses to twenty frequently asked questions (FAQs) was assessed by two independent experts using a Likert scale, followed by categorization based on CD management domains. Inter-rater reliability (agreement between experts) was determined through cross-tabulation, Cohen's kappa, and Wilcoxon signed-rank tests. Intra-rater reliability (agreement within the same expert) was evaluated using the Friedman test with post hoc comparisons. ChatGPT demonstrated high accuracy in responding to CD FAQs, with expert ratings predominantly ranging from 4 to 5. While overall performance was strong, responses to management strategies excelled compared to those related to disease etiology. Inter-rater reliability analysis revealed moderate agreement between the two experts in evaluating ChatGPT's responses (κ = 0.22, p-value = 0.026). Although both experts consistently assigned high scores across different CD management categories, subtle discrepancies emerged in specific instances. Intra-rater reliability analysis indicated high consistency in scoring for one expert (F=0.113), while the other exhibited some variability (F<0.001). ChatGPT exhibits potential as a reliable source of information for CD patients, particularly in the domain of disease management.
人工智能(AI)在提供乳糜泻(CD)相关信息方面的作用仍未得到充分研究。本研究旨在评估ChatGPT-3.5对20个基本的CD相关问题生成回答的准确性和可靠性。本研究评估了ChatGPT-3.5(研究期间占主导地位的公开可用版本),以建立人工智能辅助CD教育的基准。两位独立专家使用李克特量表评估ChatGPT对20个常见问题(FAQ)的回答准确性,然后根据CD管理领域进行分类。通过交叉制表、科恩kappa检验和威尔科克森符号秩检验确定评分者间信度(专家之间的一致性)。使用Friedman检验及事后比较评估评分者内信度(同一位专家内部的一致性)。ChatGPT在回答CD常见问题方面表现出较高的准确性,专家评分主要在4到5分之间。虽然总体表现强劲,但与疾病病因相关的回答相比,对管理策略的回答更为出色。评分者间信度分析显示,两位专家在评估ChatGPT的回答时一致性中等(κ = 0.22,p值 = 0.026)。尽管两位专家在不同的CD管理类别中始终给出高分,但在特定情况下仍出现了细微差异。评分者内信度分析表明,一位专家的评分具有高度一致性(F = 0.113),而另一位专家则表现出一定的变异性(F < 0.001)。ChatGPT有潜力成为CD患者可靠的信息来源,尤其是在疾病管理领域。
Eur J Ophthalmol. 2025-4-28
Curr Pharm Teach Learn. 2025-7-7
Ophthalmic Plast Reconstr Surg. 2024-12-24
Pediatr Discov. 2023-11-20
Res Sq. 2023-10-23
Eur J Gastroenterol Hepatol. 2024-9-1
JMIR AI. 2024-6-7
Nucl Med Mol Imaging. 2024-5