Wakao Hirotaka, Iizuka Tomomichi, Shimizu Akinobu
Institute of Engineering, Tokyo University of Agriculture and Technology, Koganei, Tokyo, Japan.
Center for Dementia, Fukujuji Hospital, Kiyose, Tokyo, Japan.
Int J Comput Assist Radiol Surg. 2025 May 9. doi: 10.1007/s11548-025-03365-6.
This study proposes a vision transformer (ViT)-based model for dementia classification, able to classify representative dementia with Alzheimer's disease, dementia with Lewy bodies, frontotemporal dementia, and healthy controls using brain single-photon emission computed tomography (SPECT) images. The proposed method allows for an input based on the anatomical structure of the brain and the efficient use of five different SPECT images.
The proposed model comprises a linear projection of input patches, eight transformer encoder layers, and a multilayered perceptron for classification with the following features: 1. diverse feature extraction with a multi-head structure for five different SPECT images; 2. Brodmann area-based input patch reflecting the anatomical structure of the brain; 3. cross-attention to fusion of diverse features.
The proposed method achieved a classification accuracy of 85.89% for 418 SPECT images from real clinical cases, significantly outperforming previous studies. Ablation studies were conducted to investigate the validity of each contribution, in which the consistency between the model's attention map and the physician's attention region was analyzed in detail.
The proposed ViT-based model demonstrated superior dementia classification accuracy compared to previous methods, and is thus expected to contribute to early diagnosis and treatment of dementia using SPECT imaging. In the future, we aim to further improve the accuracy through the incorporation of patient clinical information.
本研究提出一种基于视觉Transformer(ViT)的痴呆分类模型,该模型能够使用脑单光子发射计算机断层扫描(SPECT)图像对具有代表性的痴呆类型进行分类,包括阿尔茨海默病性痴呆、路易体痴呆、额颞叶痴呆以及健康对照。所提出的方法允许基于大脑的解剖结构进行输入,并能有效利用五种不同的SPECT图像。
所提出的模型包括输入块的线性投影、八个Transformer编码器层以及一个用于分类的多层感知器,具有以下特点:1. 针对五种不同的SPECT图像采用多头结构进行多样化特征提取;2. 基于布罗德曼区域的输入块反映大脑的解剖结构;3. 采用交叉注意力融合多样化特征。
对于来自真实临床病例的418张SPECT图像,所提出的方法实现了85.89%的分类准确率,显著优于先前的研究。进行了消融研究以探究各贡献的有效性,其中详细分析了模型注意力图与医生关注区域之间的一致性。
所提出的基于ViT的模型在痴呆分类准确率方面优于先前的方法,因此有望为利用SPECT成像进行痴呆的早期诊断和治疗做出贡献。未来,我们旨在通过纳入患者临床信息进一步提高准确率。