Neubig Luisa, Larsen Deirdre, Kunduk Melda, Kist Andreas M
Department of Artificial Intelligence in Biomedical EngineeringFriedrich-Alexander-Universität Erlangen-Nürnberg Erlangen 91054 Germany.
Department of Communication Sciences and DisordersEast Carolina University Greenville NC 27858 USA.
IEEE J Transl Eng Health Med. 2025 May 19;13:237-245. doi: 10.1109/JTEHM.2025.3571255. eCollection 2025.
Dysphagia is a common and complex disorder that complicates both diagnoses and treatment. Consequently, the associated electronic health records (EHR) are often unstructured and complex, posing challenges for systematic data analysis.
In this study, we employ natural language processing (NLP) techniques and large language models (LLMs) to automatically analyze clinical narratives and extract diagnostic information from a diverse set of EHRs. Our dataset includes medical records from 486 patients, representing a group with diverse dysphagic conditions. We analyze diagnoses provided in unstructured free text that do not follow a standardized structure. We utilize clustering algorithms on the extracted diagnostic features to identify distinct groups of patients who share similar pathophysiological swallowing dysfunctions.
We found that basic NLP techniques often provide limited insights due to the high variability of the data. In contrast, LLMs help to bridge the gap in understanding the nuanced medical information about dysphagia and related conditions. Although applying these advanced LLM models is not straightforward, our results demonstrate that leveraging closed-source models can effectively cluster different categories of dysphagia.
Our study provides therefore evidence that LLMs are highly promising in future dysphagia research.
Dysphagia is a symptom associated with various diseases, though its underlying relationships remain unclear. This study demonstrates how analyzing large volumes of electronic health records can help clarify the causes of dysphagia and identify contributing factors. By applying natural language processing, we aim to enhance both understanding and treatment, supporting clinical staff in improving individualized care by identifying relevant patient cohorts. Clinical and Translational Impact Statement: This study uses LLMs to efficiently preprocess unstructured EHRs, improving dysphagia diagnosis and patient clustering. It aligns with Clinical Research, enhancing diagnostic speed and enabling personalized treatment.
吞咽困难是一种常见且复杂的疾病,会使诊断和治疗都变得复杂。因此,相关的电子健康记录(EHR)往往是非结构化且复杂的,给系统的数据分析带来了挑战。
在本研究中,我们采用自然语言处理(NLP)技术和大语言模型(LLM)来自动分析临床叙述,并从各种不同的电子健康记录中提取诊断信息。我们的数据集包括486名患者的病历,代表了一组具有不同吞咽困难情况的患者。我们分析非结构化自由文本中提供的诊断信息,这些文本没有遵循标准化结构。我们对提取的诊断特征使用聚类算法,以识别出具有相似病理生理吞咽功能障碍的不同患者群体。
我们发现,由于数据的高度变异性,基本的自然语言处理技术往往只能提供有限的见解。相比之下,大语言模型有助于弥合在理解有关吞咽困难及相关病症的细微医学信息方面的差距。尽管应用这些先进的大语言模型并非易事,但我们的结果表明,利用闭源模型可以有效地对不同类型的吞咽困难进行聚类。
因此,我们的研究证明大语言模型在未来吞咽困难研究中极具前景。
吞咽困难是一种与多种疾病相关的症状,但其潜在关系仍不明确。本研究展示了分析大量电子健康记录如何有助于阐明吞咽困难的原因并识别相关因素。通过应用自然语言处理,我们旨在增进理解和治疗效果,通过识别相关患者群体来支持临床工作人员改善个性化护理。临床与转化影响声明:本研究使用大语言模型来高效预处理非结构化电子健康记录,改善吞咽困难诊断和患者聚类。它与临床研究相契合,提高诊断速度并实现个性化治疗。