Suppr超能文献

利用视觉语言模型的图像-文本相似性,通过图神经网络进行阿尔茨海默病识别。

Alzheimer's disease recognition using graph neural network by leveraging image-text similarity from vision language model.

作者信息

Lee Byounghwa, Bang Jeong-Uk, Song Hwa Jeon, Kang Byung Ok

机构信息

Integrated Intelligence Research Section, Electronics and Telecommunications Research Institute, Daejeon, 34129, Republic of Korea.

出版信息

Sci Rep. 2025 Jan 6;15(1):997. doi: 10.1038/s41598-024-82597-z.

Abstract

Alzheimer's disease (AD), a progressive neurodegenerative condition, notably impacts cognitive functions and daily activity. One method of detecting dementia involves a task where participants describe a given picture, and extensive research has been conducted using the participants' speech and transcribed text. However, very few studies have explored the modality of the image itself. In this work, we propose a method that predicts dementia automatically by representing the relationship between images and texts as a graph. First, we transcribe the participants' speech into text using an automatic speech recognition system. Then, we employ a vision language model to represent the relationship between the parts of the image and the corresponding descriptive sentences as a bipartite graph. Finally, we use a graph convolutional network (GCN), considering each subject as an individual graph, to classify AD patients through a graph-level classification task. In experiments conducted on the ADReSSo Challenge datasets, our model surpassed the existing state-of-the-art performance by achieving an accuracy of 88.73%. Additionally, ablation studies that removed the relationship between images and texts demonstrated the critical role of graphs in improving performance. Furthermore, by utilizing the sentence representations learned through the GCN, we identified the sentences and keywords critical for AD classification.

摘要

阿尔茨海默病(AD)是一种进行性神经退行性疾病,对认知功能和日常活动有显著影响。一种检测痴呆症的方法涉及一项任务,即让参与者描述给定的图片,并且已经使用参与者的语音和转录文本进行了广泛研究。然而,很少有研究探索图像本身的模态。在这项工作中,我们提出了一种方法,通过将图像与文本之间的关系表示为图来自动预测痴呆症。首先,我们使用自动语音识别系统将参与者的语音转录为文本。然后,我们使用视觉语言模型将图像的各个部分与相应的描述性句子之间的关系表示为二分图。最后,我们使用图卷积网络(GCN),将每个受试者视为一个单独的图,通过图级分类任务对AD患者进行分类。在针对ADReSSo挑战数据集进行的实验中,我们的模型达到了88.73%的准确率,超过了现有的最先进性能。此外,去除图像与文本之间关系的消融研究证明了图在提高性能方面的关键作用。此外,通过利用通过GCN学习到的句子表示,我们确定了对AD分类至关重要的句子和关键词。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验