• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生成式语言模型在个性化医疗信息方面的评估:工具验证研究

Evaluation of Generative Language Models in Personalizing Medical Information: Instrument Validation Study.

作者信息

Spina Aidin, Andalib Saman, Flores Daniel, Vermani Rishi, Halaseh Faris F, Nelson Ariana M

机构信息

School of Medicine, University of California, Irvine, Irvine, CA, United States.

Department of Anesthesiology and Perioperative Care, University of California, Irvine, Irvine, CA, United States.

出版信息

JMIR AI. 2024 Aug 13;3:e54371. doi: 10.2196/54371.

DOI:10.2196/54371
PMID:39137416
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11350306/
Abstract

BACKGROUND

Although uncertainties exist regarding implementation, artificial intelligence-driven generative language models (GLMs) have enormous potential in medicine. Deployment of GLMs could improve patient comprehension of clinical texts and improve low health literacy.

OBJECTIVE

The goal of this study is to evaluate the potential of ChatGPT-3.5 and GPT-4 to tailor the complexity of medical information to patient-specific input education level, which is crucial if it is to serve as a tool in addressing low health literacy.

METHODS

Input templates related to 2 prevalent chronic diseases-type II diabetes and hypertension-were designed. Each clinical vignette was adjusted for hypothetical patient education levels to evaluate output personalization. To assess the success of a GLM (GPT-3.5 and GPT-4) in tailoring output writing, the readability of pre- and posttransformation outputs were quantified using the Flesch reading ease score (FKRE) and the Flesch-Kincaid grade level (FKGL).

RESULTS

Responses (n=80) were generated using GPT-3.5 and GPT-4 across 2 clinical vignettes. For GPT-3.5, FKRE means were 57.75 (SD 4.75), 51.28 (SD 5.14), 32.28 (SD 4.52), and 28.31 (SD 5.22) for 6th grade, 8th grade, high school, and bachelor's, respectively; FKGL mean scores were 9.08 (SD 0.90), 10.27 (SD 1.06), 13.4 (SD 0.80), and 13.74 (SD 1.18). GPT-3.5 only aligned with the prespecified education levels at the bachelor's degree. Conversely, GPT-4's FKRE mean scores were 74.54 (SD 2.6), 71.25 (SD 4.96), 47.61 (SD 6.13), and 13.71 (SD 5.77), with FKGL mean scores of 6.3 (SD 0.73), 6.7 (SD 1.11), 11.09 (SD 1.26), and 17.03 (SD 1.11) for the same respective education levels. GPT-4 met the target readability for all groups except the 6th-grade FKRE average. Both GLMs produced outputs with statistically significant differences (P<.001; 8th grade P<.001; high school P<.001; bachelors P=.003; FKGL: 6th grade P=.001; 8th grade P<.001; high school P<.001; bachelors P<.001) between mean FKRE and FKGL across input education levels.

CONCLUSIONS

GLMs can change the structure and readability of medical text outputs according to input-specified education. However, GLMs categorize input education designation into 3 broad tiers of output readability: easy (6th and 8th grade), medium (high school), and difficult (bachelor's degree). This is the first result to suggest that there are broader boundaries in the success of GLMs in output text simplification. Future research must establish how GLMs can reliably personalize medical texts to prespecified education levels to enable a broader impact on health care literacy.

摘要

背景

尽管在实施方面存在不确定性,但人工智能驱动的生成式语言模型(GLMs)在医学领域具有巨大潜力。GLMs的应用可以提高患者对临床文本的理解,并改善健康素养较低的情况。

目的

本研究的目的是评估ChatGPT-3.5和GPT-4根据患者特定的输入教育水平调整医学信息复杂性的潜力,这对于将其用作解决健康素养较低问题的工具至关重要。

方法

设计了与两种常见慢性病——2型糖尿病和高血压相关的输入模板。每个临床案例针对假设的患者教育水平进行了调整,以评估输出的个性化程度。为了评估GLM(GPT-3.5和GPT-4)在调整输出文本方面的成功程度,使用弗莱什易读性分数(FKRE)和弗莱什-金凯德年级水平(FKGL)对转换前后输出的可读性进行了量化。

结果

使用GPT-3.5和GPT-4针对2个临床案例生成了80条回复。对于GPT-3.5,六年级、八年级、高中和本科水平的FKRE均值分别为57.75(标准差4.75)、51.28(标准差5.14)、32.28(标准差4.52)和28.31(标准差5.22);FKGL平均分数分别为9.08(标准差0.90)、10.27(标准差1.06)、13.4(标准差0.80)和13.74(标准差1.18)。GPT-3.5仅在本科水平上与预先指定的教育水平相符。相反,GPT-4在相同教育水平下的FKRE均值分数分别为74.54(标准差2.6)、71.25(标准差4.96)、47.61(标准差6.13)和13.71(标准差5.77),FKGL平均分数分别为6.3(标准差0.73)、6.7(标准差1.11)和11.09(标准差1.26)以及17.03(标准差1.11)。除六年级FKRE平均值外,GPT-4满足所有组的目标可读性。两个GLM在不同输入教育水平下的平均FKRE和FKGL之间产生的输出均存在统计学显著差异(P<0.001;八年级P<0.001;高中P<0.001;本科P = 0.003;FKGL:六年级P = 0.001;八年级P<0.001;高中P<0.001;本科P<0.001)。

结论

GLMs可以根据输入指定的教育程度改变医学文本输出的结构和可读性。然而,GLMs将输入教育指定分类为输出可读性的三个大致层次:简单(六年级和八年级)、中等(高中)和困难(本科)。这是第一个表明GLMs在输出文本简化方面的成功存在更广泛界限的结果。未来的研究必须确定GLMs如何能够可靠地将医学文本个性化到预先指定的教育水平,以便对医疗保健素养产生更广泛的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e94/11350306/2df6cbdfd1d7/ai_v3i1e54371_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e94/11350306/54800eb935ab/ai_v3i1e54371_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e94/11350306/5995afff1801/ai_v3i1e54371_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e94/11350306/0d58b892a057/ai_v3i1e54371_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e94/11350306/771d4e5e80e4/ai_v3i1e54371_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e94/11350306/8475f88dfe9e/ai_v3i1e54371_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e94/11350306/2df6cbdfd1d7/ai_v3i1e54371_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e94/11350306/54800eb935ab/ai_v3i1e54371_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e94/11350306/5995afff1801/ai_v3i1e54371_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e94/11350306/0d58b892a057/ai_v3i1e54371_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e94/11350306/771d4e5e80e4/ai_v3i1e54371_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e94/11350306/8475f88dfe9e/ai_v3i1e54371_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e94/11350306/2df6cbdfd1d7/ai_v3i1e54371_fig6.jpg

相似文献

1
Evaluation of Generative Language Models in Personalizing Medical Information: Instrument Validation Study.生成式语言模型在个性化医疗信息方面的评估:工具验证研究
JMIR AI. 2024 Aug 13;3:e54371. doi: 10.2196/54371.
2
Assessing the Application of Large Language Models in Generating Dermatologic Patient Education Materials According to Reading Level: Qualitative Study.评估大语言模型在根据阅读水平生成皮肤科患者教育材料方面的应用:定性研究。
JMIR Dermatol. 2024 May 16;7:e55898. doi: 10.2196/55898.
3
Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment.评估 ChatGPT 在前列腺癌患者教育中的疗效:多指标评估。
J Med Internet Res. 2024 Aug 14;26:e55939. doi: 10.2196/55939.
4
Can Artificial Intelligence Improve the Readability of Patient Education Materials on Aortic Stenosis? A Pilot Study.人工智能能否提高主动脉瓣狭窄患者教育材料的可读性?一项试点研究。
Cardiol Ther. 2024 Mar;13(1):137-147. doi: 10.1007/s40119-023-00347-0. Epub 2024 Jan 9.
5
AI-Generated Information for Vascular Patients: Assessing the Standard of Procedure-Specific Information Provided by the ChatGPT AI-Language Model.血管疾病患者的人工智能生成信息:评估ChatGPT人工智能语言模型提供的特定程序信息标准
Cureus. 2023 Nov 30;15(11):e49764. doi: 10.7759/cureus.49764. eCollection 2023 Nov.
6
Dr. Google to Dr. ChatGPT: assessing the content and quality of artificial intelligence-generated medical information on appendicitis.谷歌博士对 ChatGPT 博士:评估人工智能生成的关于阑尾炎的医学信息的内容和质量。
Surg Endosc. 2024 May;38(5):2887-2893. doi: 10.1007/s00464-024-10739-5. Epub 2024 Mar 5.
7
The Use of Large Language Models to Generate Education Materials about Uveitis.使用大型语言模型生成有关葡萄膜炎的教育材料。
Ophthalmol Retina. 2024 Feb;8(2):195-201. doi: 10.1016/j.oret.2023.09.008. Epub 2023 Sep 15.
8
Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.人工智能聊天机器人对输精管切除术相关问题回答的准确性和可读性:公众需谨慎。
Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.
9
Generative Pre-trained Transformer 4 makes cardiovascular magnetic resonance reports easy to understand.生成式预训练转换器 4 使得心血管磁共振报告易于理解。
J Cardiovasc Magn Reson. 2024 Summer;26(1):101035. doi: 10.1016/j.jocmr.2024.101035. Epub 2024 Mar 7.
10
Accuracy, readability, and understandability of large language models for prostate cancer information to the public.大语言模型向公众提供前列腺癌信息的准确性、可读性和可理解性。
Prostate Cancer Prostatic Dis. 2024 May 14. doi: 10.1038/s41391-024-00826-y.

引用本文的文献

1
Enhancing Magnetic Resonance Imaging (MRI) Report Comprehension in Spinal Trauma: Readability Analysis of AI-Generated Explanations for Thoracolumbar Fractures.提高脊柱创伤磁共振成像(MRI)报告的理解:胸腰椎骨折人工智能生成解释的可读性分析
JMIR AI. 2025 Jul 1;4:e69654. doi: 10.2196/69654.
2
Using AI to Translate and Simplify Spanish Orthopedic Medical Text: Instrument Validation Study.使用人工智能翻译和简化西班牙语骨科医学文本:仪器验证研究。
JMIR AI. 2025 Mar 21;4:e70222. doi: 10.2196/70222.
3
Tailoring glaucoma education using large language models: Addressing health disparities in patient comprehension.

本文引用的文献

1
Does ChatGPT Answer Otolaryngology Questions Accurately?ChatGPT能准确回答耳鼻喉科问题吗?
Laryngoscope. 2024 Sep;134(9):4011-4015. doi: 10.1002/lary.31410. Epub 2024 Mar 28.
2
Reliability and accuracy of artificial intelligence ChatGPT in providing information on ophthalmic diseases and management to patients.人工智能 ChatGPT 在为患者提供眼科疾病信息和管理方面的可靠性和准确性。
Eye (Lond). 2024 May;38(7):1368-1373. doi: 10.1038/s41433-023-02906-0. Epub 2024 Jan 20.
3
Optimizing Ophthalmology Patient Education via ChatBot-Generated Materials: Readability Analysis of AI-Generated Patient Education Materials and The American Society of Ophthalmic Plastic and Reconstructive Surgery Patient Brochures.
使用大语言模型定制青光眼教育:解决患者理解方面的健康差异问题。
Medicine (Baltimore). 2025 Jan 10;104(2):e41059. doi: 10.1097/MD.0000000000041059.
4
Source Characteristics Influence AI-Enabled Orthopaedic Text Simplification: Recommendations for the Future.源特征影响基于人工智能的骨科文本简化:对未来的建议。
JB JS Open Access. 2025 Jan 8;10(1). doi: 10.2106/JBJS.OA.24.00007. eCollection 2025 Jan-Mar.
通过聊天机器人生成的材料优化眼科患者教育:人工智能生成的患者教育材料和美国眼科整形重建外科学会患者手册的可读性分析。
Ophthalmic Plast Reconstr Surg. 2024;40(2):212-216. doi: 10.1097/IOP.0000000000002549. Epub 2023 Nov 16.
4
ChatGPT Interactive Medical Simulations for Early Clinical Education: Case Study.用于早期临床教育的ChatGPT交互式医学模拟:案例研究
JMIR Med Educ. 2023 Nov 10;9:e49877. doi: 10.2196/49877.
5
Enhancing Patient Communication With Chat-GPT in Radiology: Evaluating the Efficacy and Readability of Answers to Common Imaging-Related Questions.利用Chat-GPT加强放射科与患者的沟通:评估常见影像相关问题答案的有效性和可读性
J Am Coll Radiol. 2024 Feb;21(2):353-359. doi: 10.1016/j.jacr.2023.09.011. Epub 2023 Oct 18.
6
Evaluation of Artificial Intelligence-generated Responses to Common Plastic Surgery Questions.人工智能对常见整形手术问题的回答评估。
Plast Reconstr Surg Glob Open. 2023 Aug 30;11(8):e5226. doi: 10.1097/GOX.0000000000005226. eCollection 2023 Aug.
7
Readability of spine-related patient education materials: a standard method for improvement.脊柱相关患者教育材料的可读性:一种改进的标准方法。
Eur Spine J. 2023 Sep;32(9):3039-3046. doi: 10.1007/s00586-023-07856-5. Epub 2023 Jul 19.
8
Evaluating the Effectiveness of Artificial Intelligence-powered Large Language Models Application in Disseminating Appropriate and Readable Health Information in Urology.评估人工智能驱动的大型语言模型在泌尿外科传播恰当且易读的健康信息方面的有效性。
J Urol. 2023 Oct;210(4):688-694. doi: 10.1097/JU.0000000000003615. Epub 2023 Jul 10.
9
Bridging the Gap Between Urological Research and Patient Understanding: The Role of Large Language Models in Automated Generation of Layperson's Summaries.弥合泌尿科研究与患者理解之间的差距:大型语言模型在生成非专业人士摘要方面的作用。
Urol Pract. 2023 Sep;10(5):436-443. doi: 10.1097/UPJ.0000000000000428. Epub 2023 Jul 5.
10
Can ChatGPT, an Artificial Intelligence Language Model, Provide Accurate and High-quality Patient Information on Prostate Cancer?人工智能语言模型ChatGPT能否提供关于前列腺癌的准确且高质量的患者信息?
Urology. 2023 Oct;180:35-58. doi: 10.1016/j.urology.2023.05.040. Epub 2023 Jul 4.