Suppr超能文献

评估与遗传疾病衰老相关的大语言模型性能

Assessing Large Language Model Performance Related to Aging in Genetic Conditions.

作者信息

Othman Amna A, Flaharty Kendall A, Ledgister Hanchard Suzanna E, Hu Ping, Duong Dat, Waikel Rebekah L, Solomon Benjamin D

机构信息

Medical Genomics Unit, National Human Genome Research Institute, National Institutes of Health, 10 Center Drive, Bethesda, MD, 20892, USA.

出版信息

medRxiv. 2025 Jan 20:2025.01.19.25320798. doi: 10.1101/2025.01.19.25320798.

Abstract

Unlike some health conditions that have been extensively delineated throughout the lifespan, many genetic conditions are largely described in pediatric populations, with a focus on early manifestations like congenital anomalies and developmental delay. An apparent gap exists in understanding clinical features and optimal management as patients age. Generative artificial intelligence is transforming biomedical disciplines including through the introduction of large language models (LLMs). Motivated by these advances, we explored how LLMs handle age with respect to 282 genetic conditions selected based on prevalence. We divided these conditions into five categories: Disorders limited to childhood; Disorders limited to adulthood; Disorders with changes in presentation across ages; Disorders with changes in management across ages; Disorders with no changes across ages. We evaluated Llama-2-70b-chat (70b) and GPT-3.5 (GPT) capabilities at generating accurate medical vignettes for these conditions based on Correctness, Completeness, and Conciseness as graded by 3 clinicians. Using accurately generated vignettes as in-context prompts, we further generated and evaluated patient-geneticist dialogues and assessed LLM performance in answering specific questions regarding age-based management plans for a subset of conditions. Results revealed impressive performances of 70b with in-context prompting and GPT in generating vignettes. We overall did not observe age-based biases, though our experiments identified statistically significant differences in some areas related to LLM output. Despite impressive capabilities, LLMs still have limitations in clinical applications.

摘要

与一些在整个生命周期中都有广泛描述的健康状况不同,许多遗传疾病主要在儿科人群中被描述,重点关注先天性异常和发育迟缓等早期表现。随着患者年龄增长,在理解临床特征和最佳管理方面存在明显差距。生成式人工智能正在改变生物医学学科,包括通过引入大语言模型(LLMs)。受这些进展的推动,我们探讨了大语言模型如何处理基于患病率选择的282种遗传疾病的年龄问题。我们将这些疾病分为五类:仅限于儿童期的疾病;仅限于成年期的疾病;不同年龄段表现有变化的疾病;不同年龄段管理有变化的疾病;不同年龄段无变化的疾病。我们根据3位临床医生评定的正确性、完整性和简洁性,评估了Llama-2-70b-chat(70b)和GPT-3.5(GPT)为这些疾病生成准确医学案例的能力。使用准确生成的案例作为上下文提示,我们进一步生成并评估了患者与遗传学家的对话,并评估了大语言模型在回答关于一部分疾病基于年龄的管理计划的特定问题时的表现。结果显示了70b在上下文提示下以及GPT在生成案例方面的出色表现。尽管我们的实验在与大语言模型输出相关的一些领域发现了统计学上的显著差异,但总体上我们没有观察到基于年龄的偏差。尽管大语言模型有令人印象深刻 的能力,但在临床应用中仍有局限性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e15b/12191088/3997ddd219d9/nihpp-2025.01.19.25320798v1-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验