• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估大语言模型在健康素养中的功效:一项全面的横断面研究。

Assessing the Efficacy of Large Language Models in Health Literacy: A Comprehensive Cross-Sectional Study.

机构信息

Yale College, New Haven, CT, USA.

Yale Child Study Center, Yale School of Medicine, New Haven, CT, USA.

出版信息

Yale J Biol Med. 2024 Mar 29;97(1):17-27. doi: 10.59249/ZTOZ1966. eCollection 2024 Mar.

DOI:10.59249/ZTOZ1966
PMID:38559461
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10964816/
Abstract

Enhanced health literacy in children has been empirically linked to better health outcomes over the long term; however, few interventions have been shown to improve health literacy. In this context, we investigate whether large language models (LLMs) can serve as a medium to improve health literacy in children. We tested pediatric conditions using 26 different prompts in ChatGPT-3.5, ChatGPT-4, Microsoft Bing, and Google Bard (now known as Google Gemini). The primary outcome measurement was the reading grade level (RGL) of output as assessed by Gunning Fog, Flesch-Kincaid Grade Level, Automated Readability Index, and Coleman-Liau indices. Word counts were also assessed. Across all models, output for basic prompts such as "Explain" and "What is (are)," were at, or exceeded, the tenth-grade RGL. When prompts were specified to explain conditions from the first- to twelfth-grade level, we found that LLMs had varying abilities to tailor responses based on grade level. ChatGPT-3.5 provided responses that ranged from the seventh-grade to college freshmen RGL while ChatGPT-4 outputted responses from the tenth-grade to the college senior RGL. Microsoft Bing provided responses from the ninth- to eleventh-grade RGL while Google Bard provided responses from the seventh- to tenth-grade RGL. LLMs face challenges in crafting outputs below a sixth-grade RGL. However, their capability to modify outputs above this threshold, provides a potential mechanism for adolescents to explore, understand, and engage with information regarding their health conditions, spanning from simple to complex terms. Future studies are needed to verify the accuracy and efficacy of these tools.

摘要

儿童健康素养的提高与长期健康结果的改善有关;然而,很少有干预措施被证明可以提高健康素养。在这种情况下,我们研究了大型语言模型(LLM)是否可以作为提高儿童健康素养的媒介。我们使用 26 种不同的提示,在 ChatGPT-3.5、ChatGPT-4、Microsoft Bing 和 Google Bard(现称为 Google Gemini)中测试了儿科疾病。主要的输出测量是通过 Gunning Fog、Flesch-Kincaid 年级水平、自动化可读性指数和 Coleman-Liau 指数评估的阅读年级水平(RGL)。还评估了字数。在所有模型中,对于“解释”和“什么是(是)”等基本提示的输出,都达到或超过了 10 年级 RGL。当提示被指定为解释 1 到 12 年级的疾病时,我们发现 LLM 有根据年级水平调整响应的不同能力。ChatGPT-3.5 的响应范围从 7 年级到大学新生 RGL,而 ChatGPT-4 的响应范围从 10 年级到大学高年级 RGL。Microsoft Bing 的响应范围从 9 年级到 11 年级 RGL,而 Google Bard 的响应范围从 7 年级到 10 年级 RGL。LLM 在制作低于 6 年级 RGL 的输出方面面临挑战。然而,它们能够修改高于这个阈值的输出,为青少年提供了一种潜在的机制,使他们能够探索、理解和参与与自己健康状况相关的信息,涵盖从简单到复杂的术语。需要进一步的研究来验证这些工具的准确性和有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6662/10964816/a1b51c0b82b6/yjbm_97_1_17_g03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6662/10964816/0182739553b5/yjbm_97_1_17_g01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6662/10964816/ce51e999bc75/yjbm_97_1_17_g02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6662/10964816/a1b51c0b82b6/yjbm_97_1_17_g03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6662/10964816/0182739553b5/yjbm_97_1_17_g01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6662/10964816/ce51e999bc75/yjbm_97_1_17_g02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6662/10964816/a1b51c0b82b6/yjbm_97_1_17_g03.jpg

相似文献

1
Assessing the Efficacy of Large Language Models in Health Literacy: A Comprehensive Cross-Sectional Study.评估大语言模型在健康素养中的功效:一项全面的横断面研究。
Yale J Biol Med. 2024 Mar 29;97(1):17-27. doi: 10.59249/ZTOZ1966. eCollection 2024 Mar.
2
Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.大型语言模型在新冠肺炎对妊娠影响方面的熟练度、清晰度和客观性与专家知识对比:横断面试点研究
JMIR Form Res. 2025 Feb 5;9:e56126. doi: 10.2196/56126.
3
Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.评估人工智能聊天机器人提供的关于化疗心脏毒性的患者教育材料的质量和可读性:一项观察性横断面研究。
Medicine (Baltimore). 2025 Apr 11;104(15):e42135. doi: 10.1097/MD.0000000000042135.
4
Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.评估人工智能聊天机器人提供的心脏导管插入术患者教育材料的可读性:一项观察性横断面研究。
Cureus. 2024 Jul 4;16(7):e63865. doi: 10.7759/cureus.63865. eCollection 2024 Jul.
5
Assessing the Application of Large Language Models in Generating Dermatologic Patient Education Materials According to Reading Level: Qualitative Study.评估大语言模型在根据阅读水平生成皮肤科患者教育材料方面的应用:定性研究。
JMIR Dermatol. 2024 May 16;7:e55898. doi: 10.2196/55898.
6
Appropriateness and readability of Google Bard and ChatGPT-3.5 generated responses for surgical treatment of glaucoma.谷歌巴德和 ChatGPT-3.5 生成的青光眼手术治疗回复的适宜性和可读性。
Rom J Ophthalmol. 2024 Jul-Sep;68(3):243-248. doi: 10.22336/rjo.2024.45.
7
Assessing the Responses of Large Language Models (ChatGPT-4, Claude 3, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Retinopathy of Prematurity: A Study on Readability and Appropriateness.评估大型语言模型(ChatGPT-4、Claude 3、Gemini和Microsoft Copilot)对早产儿视网膜病变常见问题的回答:一项关于可读性和适宜性的研究
J Pediatr Ophthalmol Strabismus. 2025 Mar-Apr;62(2):84-95. doi: 10.3928/01913913-20240911-05. Epub 2024 Oct 28.
8
Large language models: a new frontier in paediatric cataract patient education.大语言模型:小儿白内障患者教育的新前沿。
Br J Ophthalmol. 2024 Sep 20;108(10):1470-1476. doi: 10.1136/bjo-2024-325252.
9
Is Information About Musculoskeletal Malignancies From Large Language Models or Web Resources at a Suitable Reading Level for Patients?来自大语言模型或网络资源的关于肌肉骨骼恶性肿瘤的信息对患者来说是否处于合适的阅读水平?
Clin Orthop Relat Res. 2025 Feb 1;483(2):306-315. doi: 10.1097/CORR.0000000000003263. Epub 2024 Sep 25.
10
Comparing the Efficacy of Large Language Models ChatGPT, BARD, and Bing AI in Providing Information on Rhinoplasty: An Observational Study.比较大型语言模型ChatGPT、BARD和必应人工智能在提供隆鼻信息方面的功效:一项观察性研究。
Aesthet Surg J Open Forum. 2023 Sep 14;5:ojad084. doi: 10.1093/asjof/ojad084. eCollection 2023.

引用本文的文献

1
Development, optimization, and preliminary evaluation of a novel artificial intelligence tool to promote patient health literacy in radiology reports: The Rads-Lit tool.一种用于提高放射学报告中患者健康素养的新型人工智能工具的开发、优化及初步评估:Rads-Lit工具
PLoS One. 2025 Sep 3;20(9):e0331368. doi: 10.1371/journal.pone.0331368. eCollection 2025.
2
Leveraging ChatGPT to strengthen pediatric healthcare systems: a systematic review.利用ChatGPT加强儿科医疗系统:一项系统综述
Eur J Pediatr. 2025 Jul 12;184(8):478. doi: 10.1007/s00431-025-06320-4.
3
Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.

本文引用的文献

1
Even with ChatGPT, race matters.即便有了 ChatGPT,种族问题依然存在。
Clin Imaging. 2024 May;109:110113. doi: 10.1016/j.clinimag.2024.110113. Epub 2024 Mar 2.
2
Large language models as a source of health information: Are they patient-centered? A longitudinal analysis.作为健康信息来源的大语言模型:它们是以患者为中心的吗?一项纵向分析。
Healthc (Amst). 2024 Mar;12(1):100731. doi: 10.1016/j.hjdsi.2023.100731. Epub 2023 Dec 22.
3
Accuracy of ChatGPT, Google Bard, and Microsoft Bing for Simplifying Radiology Reports.ChatGPT、谷歌巴德和微软必应简化放射学报告的准确性。
外周动脉疾病教育中的人工智能:ChatGPT与谷歌Gemini的较量
Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.
4
From Data to Decisions: Leveraging Retrieval-Augmented Generation to Balance Citation Bias in Burn Management Literature.从数据到决策:利用检索增强生成技术平衡烧伤管理文献中的引用偏差
Eur Burn J. 2025 Jun 2;6(2):28. doi: 10.3390/ebj6020028.
5
Assessing ChatGPT responses to patient questions on epidural steroid injections: A comparative study of general vs specific queries.评估ChatGPT对患者关于硬膜外类固醇注射问题的回答:一般问题与特定问题的比较研究。
Interv Pain Med. 2025 May 26;4(2):100592. doi: 10.1016/j.inpm.2025.100592. eCollection 2025 Jun.
6
Can OpenAI's New o1 Model Outperform Its Predecessors in Common Eye Care Queries?OpenAI的新o1模型在常见眼科护理问题上能否超越其前身?
Ophthalmol Sci. 2025 Feb 22;5(4):100745. doi: 10.1016/j.xops.2025.100745. eCollection 2025 Jul-Aug.
7
ChatGPT 4.0's efficacy in the self-diagnosis of non-traumatic hand conditions.ChatGPT 4.0在非创伤性手部疾病自我诊断中的效能。
J Hand Microsurg. 2025 Jan 23;17(3):100217. doi: 10.1016/j.jham.2025.100217. eCollection 2025 May.
8
Large language models in patient education: a scoping review of applications in medicine.用于患者教育的大语言模型:医学应用的范围综述
Front Med (Lausanne). 2024 Oct 29;11:1477898. doi: 10.3389/fmed.2024.1477898. eCollection 2024.
9
Comparative Study to Evaluate the Accuracy of Differential Diagnosis Lists Generated by Gemini Advanced, Gemini, and Bard for a Case Report Series Analysis: Cross-Sectional Study.评估Gemini Advanced、Gemini和Bard生成的鉴别诊断列表准确性的比较研究:用于病例报告系列分析的横断面研究。
JMIR Med Inform. 2024 Oct 2;12:e63010. doi: 10.2196/63010.
10
Comparative performance analysis of large language models: ChatGPT-3.5, ChatGPT-4 and Google Gemini in glucocorticoid-induced osteoporosis.大型语言模型的性能比较分析:ChatGPT-3.5、ChatGPT-4 和 Google Gemini 在糖皮质激素诱导性骨质疏松症中的表现。
J Orthop Surg Res. 2024 Sep 18;19(1):574. doi: 10.1186/s13018-024-04996-2.
Radiology. 2023 Nov;309(2):e232561. doi: 10.1148/radiol.232561.
4
Artificial Intelligence to Improve Patient Understanding of Radiology Reports.人工智能提高患者对放射科报告的理解。
Yale J Biol Med. 2023 Sep 29;96(3):407-417. doi: 10.59249/NKOY5498. eCollection 2023 Sep.
5
Role of Psychologists in Pediatric Congenital Heart Disease.儿科先天性心脏病中的心理学家角色。
Pediatr Clin North Am. 2022 Oct;69(5):865-878. doi: 10.1016/j.pcl.2022.05.002.
6
Pediatric to Adult Transition Literature: Scoping Review and Rheumatology Research Prioritization Survey Results.儿科到成人过渡文献:范围综述和风湿病学研究优先级调查结果。
J Rheumatol. 2022 Nov;49(11):1201-1213. doi: 10.3899/jrheum.220262. Epub 2022 Aug 1.
7
Online Patient Education Materials Related to Lipoprotein(a): Readability Assessment.脂蛋白(a)相关在线患者教育材料:可读性评估。
J Med Internet Res. 2022 Jan 11;24(1):e31284. doi: 10.2196/31284.
8
Digital Interventions to Improve Health Literacy Among Parents of Children Aged 0 to 12 Years With a Health Condition: Systematic Review.数字干预措施提高健康素养的父母的孩子年龄 0 至 12 岁的健康状况:系统评价。
J Med Internet Res. 2021 Dec 22;23(12):e31665. doi: 10.2196/31665.
9
The Role of Health Literacy in Health Behavior, Health Service Use, Health Outcomes, and Empowerment in Pediatric Patients with Chronic Disease: A Systematic Review.健康素养在慢性病患儿的健康行为、卫生服务利用、健康结局和赋权中的作用:系统评价。
Int J Environ Res Public Health. 2021 Nov 26;18(23):12464. doi: 10.3390/ijerph182312464.
10
Assessment of Health Literacy and Self-reported Readiness for Transition to Adult Care Among Adolescents and Young Adults With Spina Bifida.评估青少年和年轻的脊柱裂患者的健康素养和自我报告的成人护理过渡准备情况。
JAMA Netw Open. 2021 Sep 1;4(9):e2127034. doi: 10.1001/jamanetworkopen.2021.27034.