评估大语言模型在健康素养中的功效：一项全面的横断面研究。

Assessing the Efficacy of Large Language Models in Health Literacy: A Comprehensive Cross-Sectional Study.

机构信息

Yale College, New Haven, CT, USA.

Yale Child Study Center, Yale School of Medicine, New Haven, CT, USA.

出版信息

Yale J Biol Med. 2024 Mar 29;97(1):17-27. doi: 10.59249/ZTOZ1966. eCollection 2024 Mar.

DOI:10.59249/ZTOZ1966

PMID:38559461

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10964816/

Abstract

Enhanced health literacy in children has been empirically linked to better health outcomes over the long term; however, few interventions have been shown to improve health literacy. In this context, we investigate whether large language models (LLMs) can serve as a medium to improve health literacy in children. We tested pediatric conditions using 26 different prompts in ChatGPT-3.5, ChatGPT-4, Microsoft Bing, and Google Bard (now known as Google Gemini). The primary outcome measurement was the reading grade level (RGL) of output as assessed by Gunning Fog, Flesch-Kincaid Grade Level, Automated Readability Index, and Coleman-Liau indices. Word counts were also assessed. Across all models, output for basic prompts such as "Explain" and "What is (are)," were at, or exceeded, the tenth-grade RGL. When prompts were specified to explain conditions from the first- to twelfth-grade level, we found that LLMs had varying abilities to tailor responses based on grade level. ChatGPT-3.5 provided responses that ranged from the seventh-grade to college freshmen RGL while ChatGPT-4 outputted responses from the tenth-grade to the college senior RGL. Microsoft Bing provided responses from the ninth- to eleventh-grade RGL while Google Bard provided responses from the seventh- to tenth-grade RGL. LLMs face challenges in crafting outputs below a sixth-grade RGL. However, their capability to modify outputs above this threshold, provides a potential mechanism for adolescents to explore, understand, and engage with information regarding their health conditions, spanning from simple to complex terms. Future studies are needed to verify the accuracy and efficacy of these tools.

摘要

儿童健康素养的提高与长期健康结果的改善有关；然而，很少有干预措施被证明可以提高健康素养。在这种情况下，我们研究了大型语言模型（LLM）是否可以作为提高儿童健康素养的媒介。我们使用 26 种不同的提示，在 ChatGPT-3.5、ChatGPT-4、Microsoft Bing 和 Google Bard（现称为 Google Gemini）中测试了儿科疾病。主要的输出测量是通过 Gunning Fog、Flesch-Kincaid 年级水平、自动化可读性指数和 Coleman-Liau 指数评估的阅读年级水平（RGL）。还评估了字数。在所有模型中，对于“解释”和“什么是（是）”等基本提示的输出，都达到或超过了 10 年级 RGL。当提示被指定为解释 1 到 12 年级的疾病时，我们发现 LLM 有根据年级水平调整响应的不同能力。ChatGPT-3.5 的响应范围从 7 年级到大学新生 RGL，而 ChatGPT-4 的响应范围从 10 年级到大学高年级 RGL。Microsoft Bing 的响应范围从 9 年级到 11 年级 RGL，而 Google Bard 的响应范围从 7 年级到 10 年级 RGL。LLM 在制作低于 6 年级 RGL 的输出方面面临挑战。然而，它们能够修改高于这个阈值的输出，为青少年提供了一种潜在的机制，使他们能够探索、理解和参与与自己健康状况相关的信息，涵盖从简单到复杂的术语。需要进一步的研究来验证这些工具的准确性和有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6662/10964816/0182739553b5/yjbm_97_1_17_g01.jpg

相似文献

Assessing the Efficacy of Large Language Models in Health Literacy: A Comprehensive Cross-Sectional Study.

Yale J Biol Med. 2024 Mar 29;97(1):17-27. doi: 10.59249/ZTOZ1966. eCollection 2024 Mar.

Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.

JMIR Form Res. 2025 Feb 5;9:e56126. doi: 10.2196/56126.

Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.

Medicine (Baltimore). 2025 Apr 11;104(15):e42135. doi: 10.1097/MD.0000000000042135.

Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.

Cureus. 2024 Jul 4;16(7):e63865. doi: 10.7759/cureus.63865. eCollection 2024 Jul.

Assessing the Application of Large Language Models in Generating Dermatologic Patient Education Materials According to Reading Level: Qualitative Study.

JMIR Dermatol. 2024 May 16;7:e55898. doi: 10.2196/55898.

Appropriateness and readability of Google Bard and ChatGPT-3.5 generated responses for surgical treatment of glaucoma.

Rom J Ophthalmol. 2024 Jul-Sep;68(3):243-248. doi: 10.22336/rjo.2024.45.

Assessing the Responses of Large Language Models (ChatGPT-4, Claude 3, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Retinopathy of Prematurity: A Study on Readability and Appropriateness.

J Pediatr Ophthalmol Strabismus. 2025 Mar-Apr;62(2):84-95. doi: 10.3928/01913913-20240911-05. Epub 2024 Oct 28.

Large language models: a new frontier in paediatric cataract patient education.

Br J Ophthalmol. 2024 Sep 20;108(10):1470-1476. doi: 10.1136/bjo-2024-325252.

Is Information About Musculoskeletal Malignancies From Large Language Models or Web Resources at a Suitable Reading Level for Patients?

Clin Orthop Relat Res. 2025 Feb 1;483(2):306-315. doi: 10.1097/CORR.0000000000003263. Epub 2024 Sep 25.

Comparing the Efficacy of Large Language Models ChatGPT, BARD, and Bing AI in Providing Information on Rhinoplasty: An Observational Study.

Aesthet Surg J Open Forum. 2023 Sep 14;5:ojad084. doi: 10.1093/asjof/ojad084. eCollection 2023.

引用本文的文献

Development, optimization, and preliminary evaluation of a novel artificial intelligence tool to promote patient health literacy in radiology reports: The Rads-Lit tool.

PLoS One. 2025 Sep 3;20(9):e0331368. doi: 10.1371/journal.pone.0331368. eCollection 2025.

Leveraging ChatGPT to strengthen pediatric healthcare systems: a systematic review.

Eur J Pediatr. 2025 Jul 12;184(8):478. doi: 10.1007/s00431-025-06320-4.

Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.

Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.

From Data to Decisions: Leveraging Retrieval-Augmented Generation to Balance Citation Bias in Burn Management Literature.

Eur Burn J. 2025 Jun 2;6(2):28. doi: 10.3390/ebj6020028.

Assessing ChatGPT responses to patient questions on epidural steroid injections: A comparative study of general vs specific queries.

Interv Pain Med. 2025 May 26;4(2):100592. doi: 10.1016/j.inpm.2025.100592. eCollection 2025 Jun.

Can OpenAI's New o1 Model Outperform Its Predecessors in Common Eye Care Queries?

Ophthalmol Sci. 2025 Feb 22;5(4):100745. doi: 10.1016/j.xops.2025.100745. eCollection 2025 Jul-Aug.

ChatGPT 4.0's efficacy in the self-diagnosis of non-traumatic hand conditions.

J Hand Microsurg. 2025 Jan 23;17(3):100217. doi: 10.1016/j.jham.2025.100217. eCollection 2025 May.

Large language models in patient education: a scoping review of applications in medicine.

Front Med (Lausanne). 2024 Oct 29;11:1477898. doi: 10.3389/fmed.2024.1477898. eCollection 2024.

Comparative Study to Evaluate the Accuracy of Differential Diagnosis Lists Generated by Gemini Advanced, Gemini, and Bard for a Case Report Series Analysis: Cross-Sectional Study.

JMIR Med Inform. 2024 Oct 2;12:e63010. doi: 10.2196/63010.

Comparative performance analysis of large language models: ChatGPT-3.5, ChatGPT-4 and Google Gemini in glucocorticoid-induced osteoporosis.

J Orthop Surg Res. 2024 Sep 18;19(1):574. doi: 10.1186/s13018-024-04996-2.

本文引用的文献

Even with ChatGPT, race matters.

Clin Imaging. 2024 May;109:110113. doi: 10.1016/j.clinimag.2024.110113. Epub 2024 Mar 2.

Large language models as a source of health information: Are they patient-centered? A longitudinal analysis.

Healthc (Amst). 2024 Mar;12(1):100731. doi: 10.1016/j.hjdsi.2023.100731. Epub 2023 Dec 22.

Accuracy of ChatGPT, Google Bard, and Microsoft Bing for Simplifying Radiology Reports.

Radiology. 2023 Nov;309(2):e232561. doi: 10.1148/radiol.232561.

Artificial Intelligence to Improve Patient Understanding of Radiology Reports.

Yale J Biol Med. 2023 Sep 29;96(3):407-417. doi: 10.59249/NKOY5498. eCollection 2023 Sep.

Role of Psychologists in Pediatric Congenital Heart Disease.

Pediatr Clin North Am. 2022 Oct;69(5):865-878. doi: 10.1016/j.pcl.2022.05.002.

Pediatric to Adult Transition Literature: Scoping Review and Rheumatology Research Prioritization Survey Results.

J Rheumatol. 2022 Nov;49(11):1201-1213. doi: 10.3899/jrheum.220262. Epub 2022 Aug 1.

Online Patient Education Materials Related to Lipoprotein(a): Readability Assessment.

J Med Internet Res. 2022 Jan 11;24(1):e31284. doi: 10.2196/31284.

Digital Interventions to Improve Health Literacy Among Parents of Children Aged 0 to 12 Years With a Health Condition: Systematic Review.

J Med Internet Res. 2021 Dec 22;23(12):e31665. doi: 10.2196/31665.

The Role of Health Literacy in Health Behavior, Health Service Use, Health Outcomes, and Empowerment in Pediatric Patients with Chronic Disease: A Systematic Review.

Int J Environ Res Public Health. 2021 Nov 26;18(23):12464. doi: 10.3390/ijerph182312464.

Assessment of Health Literacy and Self-reported Readiness for Transition to Adult Care Among Adolescents and Young Adults With Spina Bifida.

JAMA Netw Open. 2021 Sep 1;4(9):e2127034. doi: 10.1001/jamanetworkopen.2021.27034.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估大语言模型在健康素养中的功效：一项全面的横断面研究。

Assessing the Efficacy of Large Language Models in Health Literacy: A Comprehensive Cross-Sectional Study.

机构信息

Yale College, New Haven, CT, USA.

Yale Child Study Center, Yale School of Medicine, New Haven, CT, USA.

出版信息

Yale J Biol Med. 2024 Mar 29;97(1):17-27. doi: 10.59249/ZTOZ1966. eCollection 2024 Mar.

DOI:10.59249/ZTOZ1966

PMID:38559461

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10964816/

Abstract

摘要

评估大语言模型在健康素养中的功效：一项全面的横断面研究。

Assessing the Efficacy of Large Language Models in Health Literacy: A Comprehensive Cross-Sectional Study.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

评估大语言模型在健康素养中的功效：一项全面的横断面研究。

Assessing the Efficacy of Large Language Models in Health Literacy: A Comprehensive Cross-Sectional Study.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献