从失语症的角度看预先训练的大型语言模型的临床疗效。

Clinical efficacy of pre-trained large language models through the lens of aphasia.

机构信息

School of Languages and Cultures, Purdue University, West Lafayette, USA.

Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, USA.

出版信息

Sci Rep. 2024 Jul 6;14(1):15573. doi: 10.1038/s41598-024-66576-y.

DOI:10.1038/s41598-024-66576-y

PMID:38971898

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11227580/

Abstract

The rapid development of large language models (LLMs) motivates us to explore how such state-of-the-art natural language processing systems can inform aphasia research. What kind of language indices can we derive from a pre-trained LLM? How do they differ from or relate to the existing language features in aphasia? To what extent can LLMs serve as an interpretable and effective diagnostic and measurement tool in a clinical context? To investigate these questions, we constructed predictive and correlational models, which utilize mean surprisals from LLMs as predictor variables. Using AphasiaBank archived data, we validated our models' efficacy in aphasia diagnosis, measurement, and prediction. Our finding is that LLMs-surprisals can effectively detect the presence of aphasia and different natures of the disorder, LLMs in conjunction with the existing language indices improve models' efficacy in subtyping aphasia, and LLMs-surprisals can capture common agrammatic deficits at both word and sentence level. Overall, LLMs have potential to advance automatic and precise aphasia prediction. A natural language processing pipeline can be greatly benefitted from integrating LLMs, enabling us to refine models of existing language disorders, such as aphasia.

摘要

大型语言模型（LLMs）的快速发展促使我们探索这种最先进的自然语言处理系统如何为失语症研究提供信息。我们可以从预先训练的 LLM 中得出哪些语言指标？它们与失语症中的现有语言特征有何不同或有何关系？在临床环境中，LLMs 可以在多大程度上作为一种可解释且有效的诊断和测量工具？为了研究这些问题，我们构建了预测和相关模型，这些模型使用来自 LLM 的平均惊讶度作为预测变量。我们使用 AphasiaBank 存档数据验证了我们的模型在失语症诊断、测量和预测方面的有效性。我们的发现是，LLMs 的惊讶度可以有效地检测出失语症的存在和不同的障碍性质，LLMs 与现有的语言指标结合可以提高失语症亚型划分的模型效能，并且 LLM 的惊讶度可以捕捉到词和句两个层面上的常见非语法缺陷。总的来说，LLMs 有可能推进自动和精确的失语症预测。自然语言处理管道可以从整合 LLM 中大大受益，使我们能够改进现有的语言障碍模型，如失语症。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3a21/11227580/9cfc56a98a01/41598_2024_66576_Fig1_HTML.jpg

相似文献

Clinical efficacy of pre-trained large language models through the lens of aphasia.从失语症的角度看预先训练的大型语言模型的临床疗效。

Sci Rep. 2024 Jul 6;14(1):15573. doi: 10.1038/s41598-024-66576-y.

On the role of the UMLS in supporting diagnosis generation proposed by Large Language Models.在支持大型语言模型提出的诊断生成中 UMLS 的作用。

J Biomed Inform. 2024 Sep;157:104707. doi: 10.1016/j.jbi.2024.104707. Epub 2024 Aug 13.

Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.分诊表现比较：大型语言模型、ChatGPT 和未经训练的急诊医生：一项对比研究。

J Med Internet Res. 2024 Jun 14;26:e53297. doi: 10.2196/53297.

Potential of Large Language Models in Health Care: Delphi Study.大语言模型在医疗保健中的潜力：德尔菲研究。

J Med Internet Res. 2024 May 13;26:e52399. doi: 10.2196/52399.

Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.评估大型语言模型与人类心理健康整合价值观的一致性：使用施瓦茨基本价值观理论的横断面研究。

JMIR Ment Health. 2024 Apr 9;11:e55988. doi: 10.2196/55988.

A comprehensive evaluation of large Language models on benchmark biomedical text processing tasks.对基准生物医学文本处理任务中大型语言模型的全面评估。

Comput Biol Med. 2024 Mar;171:108189. doi: 10.1016/j.compbiomed.2024.108189. Epub 2024 Feb 20.

Assessing the research landscape and clinical utility of large language models: a scoping review.评估大型语言模型的研究现状和临床实用性：范围综述。

BMC Med Inform Decis Mak. 2024 Mar 12;24(1):72. doi: 10.1186/s12911-024-02459-6.

Unlocking the Potential of Free Text in Electronic Health Records with Large Language Models (LLM): Enhancing Patient Safety and Consultation Interactions.利用大型语言模型 (LLM) 挖掘电子健康记录中自由文本的潜力：提升患者安全和咨询交互。

Stud Health Technol Inform. 2024 Aug 22;316:746-750. doi: 10.3233/SHTI240521.

Scale matters: Large language models with billions (rather than millions) of parameters better match neural representations of natural language.规模很重要：拥有数十亿（而非数百万）参数的大语言模型更能匹配自然语言的神经表征。

bioRxiv. 2024 Oct 16:2024.06.12.598513. doi: 10.1101/2024.06.12.598513.

A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare.ChatGPT及其他对话式大语言模型在医疗保健领域的系统评价

medRxiv. 2024 Apr 27:2024.04.26.24306390. doi: 10.1101/2024.04.26.24306390.

引用本文的文献

Treatment of aphasia in linguistically diverse populations: current and future directions.语言多样化人群失语症的治疗：现状与未来方向。

Front Psychol. 2025 Aug 14;16:1612413. doi: 10.3389/fpsyg.2025.1612413. eCollection 2025.

ABCD: A Simulation Method for Accelerating Conversational Agents With Applications in Aphasia Therapy.ABCD：一种加速对话代理的模拟方法及其在失语症治疗中的应用

J Speech Lang Hear Res. 2025 Jul 8;68(7):3322-3336. doi: 10.1044/2025_JSLHR-25-00003. Epub 2025 Jun 13.

本文引用的文献

Large language models in health care: Development, applications, and challenges.医疗保健领域的大语言模型：发展、应用与挑战。

Health Care Sci. 2023 Jul 24;2(4):255-263. doi: 10.1002/hcs2.61. eCollection 2023 Aug.

Strong Prediction: Language Model Surprisal Explains Multiple N400 Effects.强预测：语言模型意外值解释多种N400效应。

Neurobiol Lang (Camb). 2024 Apr 1;5(1):107-135. doi: 10.1162/nol_a_00105. eCollection 2024.

Word Frequency and Predictability Dissociate in Naturalistic Reading.自然阅读中单词频率与可预测性相互分离。

Open Mind (Camb). 2024 Mar 5;8:177-201. doi: 10.1162/opmi_a_00119. eCollection 2024.

Large-scale evidence for logarithmic effects of word predictability on reading time.大规模证据表明，单词可预测性对阅读时间的影响呈对数关系。

Proc Natl Acad Sci U S A. 2024 Mar 5;121(10):e2307876121. doi: 10.1073/pnas.2307876121. Epub 2024 Feb 29.

Agrammatic output in non-fluent, including Broca's, aphasia as a rational behavior.非流畅性失语（包括布罗卡失语）中的语法缺失是一种合理行为。

Aphasiology. 2023;37(12):1981-2000. doi: 10.1080/02687038.2022.2143233. Epub 2022 Nov 18.

Automating Intended Target Identification for Paraphasias in Discourse Using a Large Language Model.使用大型语言模型自动识别话语中的意图目标歧义词。

J Speech Lang Hear Res. 2023 Dec 11;66(12):4949-4966. doi: 10.1044/2023_JSLHR-23-00121. Epub 2023 Nov 6.

Large Language Models for Therapy Recommendations Across 3 Clinical Specialties: Comparative Study.大型语言模型在 3 个临床专业领域的治疗推荐中的应用：比较研究。

J Med Internet Res. 2023 Oct 30;25:e49324. doi: 10.2196/49324.

Comparison of category and letter fluency tasks through automated analysis.通过自动分析比较类别流畅性任务和字母流畅性任务。

Front Psychol. 2023 Oct 11;14:1212793. doi: 10.3389/fpsyg.2023.1212793. eCollection 2023.

Measuring Sentence Information via Surprisal: Theoretical and Clinical Implications in Nonfluent Aphasia.通过惊讶度衡量句子信息：非流畅性失语症的理论和临床意义。

Ann Neurol. 2023 Oct;94(4):647-657. doi: 10.1002/ana.26744. Epub 2023 Aug 18.

Automation of Language Sample Analysis.语言样本分析自动化。

J Speech Lang Hear Res. 2023 Jul 12;66(7):2421-2433. doi: 10.1044/2023_JSLHR-22-00642. Epub 2023 Jun 22.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

从失语症的角度看预先训练的大型语言模型的临床疗效。

Clinical efficacy of pre-trained large language models through the lens of aphasia.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献