Alrubaian Abdullah
Department of Special Education, College of Education, Qassim University, Buraydah, Saudi Arabia.
Psychiatr Q. 2025 Jun 12. doi: 10.1007/s11126-025-10170-6.
The current study aimed to evaluate the quality, usefulness, and reliability of three large language models (LLMs)-ChatGPT-4, DeepSeek, and Gemini-in answering general questions about specific learning disorders (SLDs), specifically dyslexia and dyscalculia. For each learning disorder subtype, 15 questions were developed through expert review of social media, forums, and professional input. Responses from the LLMs were evaluated using the Global Quality Scale (GQS) and a seven-point Likert scale to assess usefulness and reliability. Statistical analyses were conducted to compare model performance, including descriptive statistics and one-way ANOVA. Results revealed no statistically significant differences in quality or usefulness across models for both disorders. However, ChatGPT-4 demonstrated superior reliability for dyscalculia (p < 0.05), outperforming Gemini and DeepSeek. For dyslexia, DeepSeek achieved 100% maximum reliability scores, while GPT-4 and Gemini scored 60%. All models provided high-quality responses, with mean GQS scores ranging from 4.20 to 4.60 for dyslexia and 3.93 to 4.53 for dyscalculia, although variability existed in their practical utility. While LLMs show promise in delivering dyslexia and dyscalculia-related information, GPT-4's reliability for dyscalculia highlights its potential as a supplementary educational tool. Further validation by professionals remains critical.
当前的研究旨在评估三种大语言模型(LLMs)——ChatGPT-4、豆包和Gemini——在回答关于特定学习障碍(SLDs),特别是阅读障碍和计算障碍的一般问题时的质量、有用性和可靠性。对于每种学习障碍亚型,通过对社交媒体、论坛的专家审查和专业意见,提出了15个问题。使用全球质量量表(GQS)和七点李克特量表对大语言模型的回答进行评估,以评估其有用性和可靠性。进行了统计分析以比较模型性能,包括描述性统计和单因素方差分析。结果显示,两种障碍在各模型的质量或有用性方面没有统计学上的显著差异。然而,ChatGPT-4在计算障碍方面表现出更高的可靠性(p < 0.05),优于Gemini和豆包。对于阅读障碍,豆包获得了100%的最高可靠性分数,而ChatGPT-4和Gemini的得分是60%。所有模型都提供了高质量的回答,阅读障碍的平均GQS分数在4.20至4.60之间,计算障碍的平均GQS分数在3.93至4.53之间,尽管它们的实际效用存在差异。虽然大语言模型在提供与阅读障碍和计算障碍相关的信息方面显示出前景,但ChatGPT-4在计算障碍方面的可靠性突出了其作为辅助教育工具的潜力。专业人员的进一步验证仍然至关重要。