• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用大语言模型生成适合儿童的近视教育材料。

Using large language models to generate child-friendly education materials on myopia.

作者信息

Li Xuewei, Zhang Yixuan, Zheng Tonglei, Deng Yuanqi, Lu Yuchang, Hu Jie, Chen Sitong, Li Yan, Wang Kai

机构信息

Department of Ophthalmology, Peking University People's Hospital, Eye Diseases and Optometry Institute, Beijing, China.

Institute of Medical Technology, Peking University Health Science Center, Beijing, China.

出版信息

Digit Health. 2025 Jul 30;11:20552076251362338. doi: 10.1177/20552076251362338. eCollection 2025 Jan-Dec.

DOI:10.1177/20552076251362338
PMID:40755959
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12317229/
Abstract

AIM

To evaluate the ability of large language models (LLMs) to produce patient education materials for myopic children and their parents.

METHODS

Thirty-five common myopia-related questions were used with two distinct prompts to produce responses aimed at adults (Prompt A) and children (Prompt B). Five ophthalmologists evaluated the responses using a 5-point Likert scale for correctness, completeness, conciseness, and potential harm. Readability was assessed via Flesch-Kincaid scores. The Kruskal-Wallis and Mann-Whitney tests were used to identify significant differences in LLM performance.

RESULTS

ChatGPT 4o achieved the most positive ratings ("good" and above) in correctness (Prompt A: 91%; Prompt B: 83%) and conciseness (Prompt A: 79%; Prompt B: 63%), as well as the lowest negative ratings in potential harm ratings ("not at all" and "slightly," Prompt A: 99%; Prompt B: 97%) in the generation of educational materials for both adults and children (all  < 0.001). In terms of completeness, the results varied between the two prompts. Specifically, in Prompt A, ChatGPT 4.0 demonstrated the highest level of completeness (ChatGPT 4o: 69%, ChatGPT 4.0: 74%, ChatGPT 3.5: 51%, Gemini: 73%,  < 0.001), whereas in Prompt B, ChatGPT 4o achieved the highest score (ChatGPT 4o: 71%, ChatGPT 4.0: 65%, ChatGPT 3.5: 38%, Gemini: 46%,  < 0.001). The responses generated with Prompt B were significantly more readable than those generated with Prompt A across all LLMs ( ≤ 0.001).

CONCLUSION

Large language models, particularly ChatGPT 4o, hold potential for delivering effective patient education materials on myopia for both adult and pediatric populations. While generally effective, LLMs have limitations for complex medical queries, necessitating continued refinement for reliable clinical use.

摘要

目的

评估大语言模型(LLMs)为近视儿童及其家长生成患者教育材料的能力。

方法

使用35个常见的与近视相关的问题,通过两个不同的提示来生成针对成年人(提示A)和儿童(提示B)的回答。五位眼科医生使用5点李克特量表对回答的正确性、完整性、简洁性和潜在危害进行评估。通过弗莱什-金凯德分数评估可读性。使用克鲁斯卡尔-沃利斯检验和曼-惠特尼检验来确定大语言模型性能的显著差异。

结果

ChatGPT 4o在生成针对成年人和儿童的教育材料时,在正确性(提示A:91%;提示B:83%)和简洁性(提示A:79%;提示B:63%)方面获得了最积极的评分(“良好”及以上),在潜在危害评分中获得了最低的负面评分(“完全没有”和“轻微”,提示A:99%;提示B:97%)(所有P均<0.001)。在完整性方面,两个提示的结果有所不同。具体而言,在提示A中,ChatGPT 4.0表现出最高的完整性水平(ChatGPT 4o:69%,ChatGPT 4.0:74%,ChatGPT 3.5:51%,Gemini:73%,P<0.001),而在提示B中,ChatGPT 4o获得了最高分(ChatGPT 4o:71%,ChatGPT 4.0:65%,ChatGPT 3.5:38%,Gemini:46%,P<0.001)。在所有大语言模型中,提示B生成的回答比提示A生成的回答可读性显著更高(P≤0.001)。

结论

大语言模型,尤其是ChatGPT 4o,有潜力为成人和儿童群体提供关于近视的有效患者教育材料。虽然大语言模型总体上有效,但对于复杂的医学问题存在局限性,需要持续改进以用于可靠的临床应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a0f/12317229/2a258532c12f/10.1177_20552076251362338-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a0f/12317229/791599ff045e/10.1177_20552076251362338-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a0f/12317229/1f117eecf0ad/10.1177_20552076251362338-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a0f/12317229/ea6666545a3b/10.1177_20552076251362338-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a0f/12317229/2a258532c12f/10.1177_20552076251362338-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a0f/12317229/791599ff045e/10.1177_20552076251362338-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a0f/12317229/1f117eecf0ad/10.1177_20552076251362338-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a0f/12317229/ea6666545a3b/10.1177_20552076251362338-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a0f/12317229/2a258532c12f/10.1177_20552076251362338-fig4.jpg

相似文献

1
Using large language models to generate child-friendly education materials on myopia.使用大语言模型生成适合儿童的近视教育材料。
Digit Health. 2025 Jul 30;11:20552076251362338. doi: 10.1177/20552076251362338. eCollection 2025 Jan-Dec.
2
Evaluation of large language models in patient education and clinical decision support for rotator cuff injury: a two-phase benchmarking study.大型语言模型在肩袖损伤患者教育和临床决策支持中的评估:一项两阶段基准研究。
BMC Med Inform Decis Mak. 2025 Aug 4;25(1):289. doi: 10.1186/s12911-025-03105-5.
3
ChatGPT-4o Compared With Human Researchers in Writing Plain-Language Summaries for Cochrane Reviews: A Blinded, Randomized Non-Inferiority Controlled Trial.ChatGPT-4o与人类研究人员在为Cochrane系统评价撰写通俗易懂的总结方面的比较:一项双盲、随机非劣效性对照试验。
Cochrane Evid Synth Methods. 2025 Jul 28;3(4):e70037. doi: 10.1002/cesm.70037. eCollection 2025 Jul.
4
Accuracy of ChatGPT, Gemini, Copilot, and Claude to Blepharoplasty-Related Questions.ChatGPT、Gemini、Copilot和Claude对双眼皮手术相关问题的回答准确性。
Aesthetic Plast Surg. 2025 Jul 21. doi: 10.1007/s00266-025-05071-9.
5
Application of Large Language Models in Stroke Rehabilitation Health Education: 2-Phase Study.大语言模型在中风康复健康教育中的应用:两阶段研究
J Med Internet Res. 2025 Jul 22;27:e73226. doi: 10.2196/73226.
6
Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.外周动脉疾病教育中的人工智能:ChatGPT与谷歌Gemini的较量
Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.
7
Assessment of Recommendations Provided to Athletes Regarding Sleep Education by GPT-4o and Google Gemini: Comparative Evaluation Study.GPT-4o和谷歌Gemini向运动员提供的关于睡眠教育的建议评估:比较评估研究
JMIR Form Res. 2025 Jul 8;9:e71358. doi: 10.2196/71358.
8
Enhancing the Readability of Online Patient Education Materials Using Large Language Models: Cross-Sectional Study.使用大语言模型提高在线患者教育材料的可读性:横断面研究。
J Med Internet Res. 2025 Jun 4;27:e69955. doi: 10.2196/69955.
9
Large language models: unlocking new potential in patient education for thyroid eye disease.大语言模型:释放甲状腺眼病患者教育的新潜力。
Endocrine. 2025 Jul 19. doi: 10.1007/s12020-025-04339-z.
10
Assessment of readability, reliability, and quality of large language models in addressing frequently asked questions regarding prenatal screening for fetal chromosomal anomalies.评估大语言模型在解答有关胎儿染色体异常产前筛查常见问题方面的可读性、可靠性和质量。
Int J Gynaecol Obstet. 2025 Jul 1. doi: 10.1002/ijgo.70348.

本文引用的文献

1
Leveraging long context in retrieval augmented language models for medical question answering.在检索增强语言模型中利用长上下文进行医学问答。
NPJ Digit Med. 2025 May 2;8(1):239. doi: 10.1038/s41746-025-01651-w.
2
Utility of Generative Artificial Intelligence for Patient Care Counseling for Mandibular Fractures.生成式人工智能在下颌骨骨折患者护理咨询中的应用
J Craniofac Surg. 2024 Nov 4. doi: 10.1097/SCS.0000000000010832.
3
Expert-Guided Large Language Models for Clinical Decision Support in Precision Oncology.专家指导的大型语言模型在精准肿瘤学中的临床决策支持。
JCO Precis Oncol. 2024 Oct;8:e2400478. doi: 10.1200/PO-24-00478. Epub 2024 Oct 30.
4
Assessing the Quality of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.评估人工智能聊天机器人提供的心脏导管插入术患者教育材料的质量:一项观察性横断面研究。
Cureus. 2024 Sep 23;16(9):e69996. doi: 10.7759/cureus.69996. eCollection 2024 Sep.
5
Large-Language Models in Orthodontics: Assessing Reliability and Validity of ChatGPT in Pretreatment Patient Education.正畸学中的大语言模型:评估ChatGPT在治疗前患者教育中的可靠性和有效性。
Cureus. 2024 Aug 29;16(8):e68085. doi: 10.7759/cureus.68085. eCollection 2024 Aug.
6
Lung Cancer Staging Using Chest CT and FDG PET/CT Free-Text Reports: Comparison Among Three ChatGPT Large Language Models and Six Human Readers of Varying Experience.使用胸部CT和FDG PET/CT自由文本报告进行肺癌分期:三种ChatGPT大语言模型与六位不同经验水平的人类读者的比较
AJR Am J Roentgenol. 2024 Dec;223(6):e2431696. doi: 10.2214/AJR.24.31696. Epub 2024 Sep 4.
7
Large language models: a new frontier in paediatric cataract patient education.大语言模型:小儿白内障患者教育的新前沿。
Br J Ophthalmol. 2024 Sep 20;108(10):1470-1476. doi: 10.1136/bjo-2024-325252.
8
The Accuracy of Artificial Intelligence ChatGPT in Oncology Examination Questions.人工智能 ChatGPT 在肿瘤学检查问题中的准确性。
J Am Coll Radiol. 2024 Nov;21(11):1800-1804. doi: 10.1016/j.jacr.2024.07.011. Epub 2024 Aug 2.
9
Artificial Intelligence-Generated Patient Education Materials for Helicobacter pylori Infection: A Comparative Analysis.人工智能生成的幽门螺杆菌感染患者教育材料:比较分析。
Helicobacter. 2024 Jul-Aug;29(4):e13115. doi: 10.1111/hel.13115.
10
The accuracy of Gemini, GPT-4, and GPT-4o in ECG analysis: A comparison with cardiologists and emergency medicine specialists.Gemini、GPT-4 和 GPT-4o 在心电图分析中的准确性:与心脏病专家和急诊医学专家的比较。
Am J Emerg Med. 2024 Oct;84:68-73. doi: 10.1016/j.ajem.2024.07.043. Epub 2024 Jul 30.