评估ChatGPT在口吃方面的回答质量和可读性。

Assessing the response quality and readability of ChatGPT in stuttering.

作者信息

Saeedi Saeed, Bakhtiar Mehdi

机构信息

Speech and Neuromodulation Laboratory, Unit of Human Communication, Learning and Development, Faculty of Education, The University of Hong Kong, Hong Kong Special Administrative Region of China.

出版信息

J Fluency Disord. 2025 Sep;85:106149. doi: 10.1016/j.jfludis.2025.106149. Epub 2025 Aug 15.

OBJECTIVE

This study aimed to examine how frequently asked questions regarding stuttering were comprehended and answered by ChatGPT.

METHODS

In this exploratory study, eleven common questions about stuttering were asked in a single conversation with the GPT-4o mini. While being blind relative to the source of the answers (whether by AI or SLPs), a panel of five certified speech and language pathologists (SLPs) was requested to differentiate if responses were produced by the ChatGPT chatbot or provided by SLPs. Additionally, they were instructed to evaluate the responses based on several criteria, including the presence of inaccuracies, the potential for causing harm and the degree of harm that could result, and alignment with the prevailing consensus within the SLP community. All ChatGPT responses were also evaluated utilizing various readability features, including the Flesch Reading Ease Score (FRES), Gunning Fog Scale Level (GFSL), and Dale-Chall Score (D-CS), the number of words, number of sentences, words per sentence (WPS), characters per word (CPW), and the percentage of difficult words. Furthermore, Spearman's rank correlation coefficient was employed to examine relationship between the evaluations conducted by the panel of certified SLPs and readability features.

RESULTS

A substantial proportion of the AI-generated responses (45.50 %) were incorrectly identified by SLP panel as being written by other SLPs, indicating high perceived human-likeness (origin). Regarding content quality, 83.60 % of the responses were found to be accurate (incorrectness), 63.60 % were rated as harmless (harm), and 38.20 % were considered to cause only minor to moderate impact (extent of harm). In terms of professional alignment, 62 % of the responses reflected the prevailing views within the SLP community (consensus). The means ± standard deviation of FRES, GFSL, and D-CS were 26.52 ± 13.94 (readable for college graduates), 18.17 ± 3.39 (readable for graduate students), and 9.90 ± 1.08 (readable for 13th to 15th grade [college]), respectively. Furthermore, each response contained an average of 99.73 words, 6.80 sentences, 17.44 WPS, 5.79 CPW, and 27.96 % difficult words. The correlation coefficients ranged between significantly large negative value (r = -0.909, p < 0.05) to very large positive value (r = 0.918, p < 0.05).

CONCLUSION

The results revealed that the emerging ChatGPT possesses a promising capability to provide appropriate responses to frequently asked questions in the field of stuttering, which is attested by the fact that panel of certified SLPs perceived about 45 % of them to be generated by SLPs. However, given the increasing accessibility of AI tools, particularly among individuals with limited access to professional services, it is crucial to emphasize that such tools are intended solely for educational purposes and should not replace diagnosis or treatment by qualified SLPs.

目的

本研究旨在考察ChatGPT对有关口吃的常见问题的理解和回答频率。

方法

在这项探索性研究中，在与GPT-4o mini的一次对话中提出了11个关于口吃的常见问题。在对答案来源（无论是由人工智能还是言语语言病理学家提供）不知情的情况下，要求由五名认证言语语言病理学家（SLP）组成的小组区分回答是由ChatGPT聊天机器人生成还是由SLP提供。此外，要求他们根据几个标准评估回答，包括是否存在不准确之处、造成伤害的可能性以及可能导致的伤害程度，以及是否与SLP社区内的主流共识一致。所有ChatGPT的回答还利用各种可读性特征进行评估，包括弗莱什易读性分数（FRES）、冈宁雾度等级（GFSL）和戴尔-查尔分数（D-CS）、单词数量、句子数量、每句单词数（WPS）、每个单词的字符数（CPW）以及难词百分比。此外，采用斯皮尔曼等级相关系数来检验认证SLP小组的评估与可读性特征之间的关系。

结果

SLP小组错误地将相当一部分人工智能生成的回答（45.50%）识别为由其他SLP撰写，表明其具有较高的拟人化感知（来源）。在内容质量方面，83.60%的回答被认为是准确的（不正确性），63.60%被评为无害（伤害），38.20%被认为只会造成轻微到中度的影响（伤害程度）。在专业一致性方面，62%的回答反映了SLP社区内的主流观点（共识）。FRES、GFSL和D-CS的平均值±标准差分别为26.52±13.94（适合大学毕业生阅读）、18.17±3.39（适合研究生阅读）和9.90±1.08（适合13至15年级[大学]阅读）。此外，每个回答平均包含99.73个单词、6.80个句子、17.44个WPS、5.79个CPW和27.96%的难词。相关系数范围从显著的大负值（r = -0.909，p < 0.05）到非常大的正值（r = 0.918，p < 0.05）。

结论

结果表明，新兴的ChatGPT有能力为口吃领域的常见问题提供适当的回答，这一点得到了认证SLP小组认为约45%的回答是由SLP生成的这一事实的证明。然而，鉴于人工智能工具的可及性不断提高，特别是在获得专业服务机会有限的人群中，必须强调此类工具仅用于教育目的，不应取代合格SLP的诊断或治疗。

Assessing the response quality and readability of ChatGPT in stuttering.

作者信息

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献