Bi Yuda, Abrol Anees, Fu Zening, Calhoun Vince D
Tri-institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia Tech, Emory, Atlanta, Georgia, USA.
Hum Brain Mapp. 2024 Dec 1;45(17):e26783. doi: 10.1002/hbm.26783.
Multimodal neuroimaging is an emerging field that leverages multiple sources of information to diagnose specific brain disorders, especially when deep learning-based AI algorithms are applied. The successful combination of different brain imaging modalities using deep learning remains a challenging yet crucial research topic. The integration of structural and functional modalities is particularly important for the diagnosis of various brain disorders, where structural information plays a crucial role in diseases such as Alzheimer's, while functional imaging is more critical for disorders such as schizophrenia. However, the combination of functional and structural imaging modalities can provide a more comprehensive diagnosis. In this work, we present MultiViT, a novel diagnostic deep learning model that utilizes vision transformers and cross-attention mechanisms to effectively fuse information from 3D gray matter maps derived from structural MRI with functional network connectivity matrices obtained from functional MRI using the ICA algorithm. MultiViT achieves an AUC of 0.833, outperforming both our unimodal and multimodal baselines, enabling more accurate classification and diagnosis of schizophrenia. In addition, using vision transformer's unique attentional maps in combination with cross-attentional mechanisms and brain function information, we identify critical brain regions in 3D gray matter space associated with the characteristics of schizophrenia. Our research not only significantly improves the accuracy of AI-based automated imaging diagnostics for schizophrenia, but also pioneers a rational and advanced data fusion approach by replacing complex, high-dimensional fMRI information with functional network connectivity, integrating it with representative structural data from 3D gray matter images, and further providing interpretative biomarker localization in a 3D structural space.
多模态神经影像学是一个新兴领域,它利用多种信息来源来诊断特定的脑部疾病,特别是在应用基于深度学习的人工智能算法时。使用深度学习成功结合不同的脑成像模态仍然是一个具有挑战性但至关重要的研究课题。结构和功能模态的整合对于各种脑部疾病的诊断尤为重要,其中结构信息在阿尔茨海默病等疾病中起着关键作用,而功能成像对于精神分裂症等疾病更为关键。然而,功能和结构成像模态的结合可以提供更全面的诊断。在这项工作中,我们提出了MultiViT,这是一种新颖的诊断深度学习模型,它利用视觉变换器和交叉注意力机制,有效地将来自结构MRI的3D灰质图信息与使用ICA算法从功能MRI获得的功能网络连接矩阵信息融合在一起。MultiViT的AUC达到0.833,优于我们的单模态和多模态基线,能够更准确地对精神分裂症进行分类和诊断。此外,通过将视觉变换器独特的注意力图与交叉注意力机制和脑功能信息相结合,我们在3D灰质空间中识别出与精神分裂症特征相关的关键脑区。我们的研究不仅显著提高了基于人工智能的精神分裂症自动成像诊断的准确性,而且开创了一种合理且先进的数据融合方法,即通过用功能网络连接取代复杂的高维功能磁共振成像信息,并将其与来自3D灰质图像的代表性结构数据整合,进而在3D结构空间中提供可解释的生物标志物定位。