ChatGPT对以患者为中心的斜视问题的回答的准确性和可读性。

Accuracy and Readability of ChatGPT Responses to Patient-Centric Strabismus Questions.

作者信息

Gary Ashlyn A, Lai James M, Locatelli Elyana V T, Falcone Michelle M, Cavuoto Kara M

出版信息

J Pediatr Ophthalmol Strabismus. 2025 May-Jun;62(3):220-227. doi: 10.3928/01913913-20250110-02. Epub 2025 Feb 19.

DOI:10.3928/01913913-20250110-02

PMID:39969263

Abstract

PURPOSE

To assess the medical accuracy and readability of responses provided by ChatGPT (OpenAI), the most widely used artificial intelligence-powered chatbot, regarding questions about strabismus.

METHODS

Thirty-four questions were input into ChatGPT 3.5 (free version) and 4.0 (paid version) at three time intervals (day 0, 1 week, and 1 month) in two distinct geographic locations (California and Florida) in March 2024. Two pediatric ophthalmologists rated responses as "acceptable," "accurate but missing key information or minor inaccuracies," or "inaccurate and potentially harmful." The online tool, Readable, measured the Flesch-Kincaid Grade Level and Flesch Reading Ease Score to assess readability.

RESULTS

Overall, 64% of responses by ChatGPT were "acceptable;" but the proportion of "acceptable" responses differed by version (47% for ChatGPT 3.5 vs 53% for 4.0, < .05) and state (77% of California vs 51% of Florida, < .001). Responses in Florida were more likely to be "inaccurate and potentially harmful" compared to those in California (6.9% vs. 1.5%, < .001). Over 1 month, the overall percentage of "acceptable" responses increased (60% at day 0, 64% at 1 week, and 67% at 1 month, > .05), whereas "inaccurate and potentially harmful" responses decreased (5% at day 0, 5% at 1 week, and 3% at 1 month, > .05). On average, responses scored a Flesch-Kincaid Grade Level score of 15, equating to a higher than high school grade reading level.

CONCLUSIONS

Although most of ChatGPT's responses to strabismus questions were clinically acceptable, there were variations in responses across time and geographic regions. The average reading level exceeded a high school level and demonstrated low readability. Although ChatGPT demonstrates potential as a supplementary resource for parents and patients with strabismus, improving the accuracy and readability of free versions of ChatGPT may increase its utility. .

摘要

目的

评估最广泛使用的人工智能驱动聊天机器人ChatGPT（OpenAI）针对斜视问题给出的回答在医学准确性和可读性方面的表现。

方法

2024年3月，在两个不同地理位置（加利福尼亚州和佛罗里达州）的三个时间点（第0天、第1周和第1个月），将34个问题输入ChatGPT 3.5（免费版）和4.0（付费版）。两名儿科眼科医生将回答评为“可接受”、“准确但缺少关键信息或有小错误”或“不准确且可能有害”。使用在线工具Readable测量弗莱什-金凯德年级水平和弗莱什阅读易读性得分以评估可读性。

结果

总体而言，ChatGPT给出的回答中有64%是“可接受的”；但“可接受”回答的比例因版本而异（ChatGPT 3.5为47%，ChatGPT 4.0为53%，P<0.05），也因州而异（加利福尼亚州为77%，佛罗里达州为51%，P<0.

相似文献

Accuracy and Readability of ChatGPT Responses to Patient-Centric Strabismus Questions.ChatGPT对以患者为中心的斜视问题的回答的准确性和可读性。

J Pediatr Ophthalmol Strabismus. 2025 May-Jun;62(3):220-227. doi: 10.3928/01913913-20250110-02. Epub 2025 Feb 19.

American Academy of Orthopaedic Surgeons OrthoInfo provides more readable information regarding rotator cuff injury than ChatGPT.美国矫形外科医师学会的OrthoInfo提供了比ChatGPT更具可读性的关于肩袖损伤的信息。

J ISAKOS. 2025 Feb 12;12:100841. doi: 10.1016/j.jisako.2025.100841.

The performance of ChatGPT-4 and Bing Chat in frequently asked questions about glaucoma.ChatGPT-4和必应聊天在青光眼常见问题方面的表现。

Eur J Ophthalmol. 2025 Jul;35(4):1323-1328. doi: 10.1177/11206721251321197. Epub 2025 Feb 19.

Enhancing the Readability of Online Patient Education Materials Using Large Language Models: Cross-Sectional Study.使用大语言模型提高在线患者教育材料的可读性：横断面研究。

J Med Internet Res. 2025 Jun 4;27:e69955. doi: 10.2196/69955.

Chat Generative Pretraining Transformer Answers Patient-focused Questions in Cervical Spine Surgery.ChatGPT 生成式预训练转换器可回答颈椎手术患者关注的问题。

Clin Spine Surg. 2024 Jul 1;37(6):E278-E281. doi: 10.1097/BSD.0000000000001600. Epub 2024 Mar 21.

Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.评估ChatGPT、Gemini和Perplexity针对强直性脊柱炎最常见问题生成的回答的可读性、质量和可靠性。

PLoS One. 2025 Jun 18;20(6):e0326351. doi: 10.1371/journal.pone.0326351. eCollection 2025.

Is ChatGPT a more academic source than google searches for patient questions about hip arthroscopy? An analysis of the most frequently asked questions.对于患者关于髋关节镜检查的问题，ChatGPT 比谷歌搜索是更具学术性的信息来源吗？对最常见问题的分析。

J ISAKOS. 2025 Jun;12:100892. doi: 10.1016/j.jisako.2025.100892. Epub 2025 May 3.

Evaluating ChatGPT as a patient resource for frequently asked questions about lung cancer surgery-a pilot study.评估ChatGPT作为肺癌手术常见问题患者资源的可行性——一项试点研究。

J Thorac Cardiovasc Surg. 2025 Apr;169(4):1174-1180.e18. doi: 10.1016/j.jtcvs.2024.09.030. Epub 2024 Sep 24.

The effectiveness and cost-effectiveness of carmustine implants and temozolomide for the treatment of newly diagnosed high-grade glioma: a systematic review and economic evaluation.卡莫司汀植入剂与替莫唑胺治疗新诊断的高级别胶质瘤的有效性和成本效益：一项系统评价与经济学评估

Health Technol Assess. 2007 Nov;11(45):iii-iv, ix-221. doi: 10.3310/hta11450.

Online and ChatGPT-generated patient education materials regarding brain tumor prognosis fail to meet readability standards.关于脑肿瘤预后的在线及由ChatGPT生成的患者教育材料未达到可读性标准。

J Clin Neurosci. 2025 Aug;138:111410. doi: 10.1016/j.jocn.2025.111410. Epub 2025 Jun 20.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

ChatGPT对以患者为中心的斜视问题的回答的准确性和可读性。

Accuracy and Readability of ChatGPT Responses to Patient-Centric Strabismus Questions.

作者信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献