Suppr超能文献

ChatGPT对以患者为中心的斜视问题的回答的准确性和可读性。

Accuracy and Readability of ChatGPT Responses to Patient-Centric Strabismus Questions.

作者信息

Gary Ashlyn A, Lai James M, Locatelli Elyana V T, Falcone Michelle M, Cavuoto Kara M

出版信息

J Pediatr Ophthalmol Strabismus. 2025 May-Jun;62(3):220-227. doi: 10.3928/01913913-20250110-02. Epub 2025 Feb 19.

Abstract

PURPOSE

To assess the medical accuracy and readability of responses provided by ChatGPT (OpenAI), the most widely used artificial intelligence-powered chatbot, regarding questions about strabismus.

METHODS

Thirty-four questions were input into ChatGPT 3.5 (free version) and 4.0 (paid version) at three time intervals (day 0, 1 week, and 1 month) in two distinct geographic locations (California and Florida) in March 2024. Two pediatric ophthalmologists rated responses as "acceptable," "accurate but missing key information or minor inaccuracies," or "inaccurate and potentially harmful." The online tool, Readable, measured the Flesch-Kincaid Grade Level and Flesch Reading Ease Score to assess readability.

RESULTS

Overall, 64% of responses by ChatGPT were "acceptable;" but the proportion of "acceptable" responses differed by version (47% for ChatGPT 3.5 vs 53% for 4.0, < .05) and state (77% of California vs 51% of Florida, < .001). Responses in Florida were more likely to be "inaccurate and potentially harmful" compared to those in California (6.9% vs. 1.5%, < .001). Over 1 month, the overall percentage of "acceptable" responses increased (60% at day 0, 64% at 1 week, and 67% at 1 month, > .05), whereas "inaccurate and potentially harmful" responses decreased (5% at day 0, 5% at 1 week, and 3% at 1 month, > .05). On average, responses scored a Flesch-Kincaid Grade Level score of 15, equating to a higher than high school grade reading level.

CONCLUSIONS

Although most of ChatGPT's responses to strabismus questions were clinically acceptable, there were variations in responses across time and geographic regions. The average reading level exceeded a high school level and demonstrated low readability. Although ChatGPT demonstrates potential as a supplementary resource for parents and patients with strabismus, improving the accuracy and readability of free versions of ChatGPT may increase its utility. .

摘要

目的

评估最广泛使用的人工智能驱动聊天机器人ChatGPT(OpenAI)针对斜视问题给出的回答在医学准确性和可读性方面的表现。

方法

2024年3月,在两个不同地理位置(加利福尼亚州和佛罗里达州)的三个时间点(第0天、第1周和第1个月),将34个问题输入ChatGPT 3.5(免费版)和4.0(付费版)。两名儿科眼科医生将回答评为“可接受”、“准确但缺少关键信息或有小错误”或“不准确且可能有害”。使用在线工具Readable测量弗莱什-金凯德年级水平和弗莱什阅读易读性得分以评估可读性。

结果

总体而言,ChatGPT给出的回答中有64%是“可接受的”;但“可接受”回答的比例因版本而异(ChatGPT 3.5为47%,ChatGPT 4.0为53%,P<0.05),也因州而异(加利福尼亚州为77%,佛罗里达州为51%,P<0.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验