Sun Jian, Dodge Hiroko H, Mahoor Mohammad H
Department Of Computer Science, University of Denver, 2155 E Wesley Ave, Denver, Colorado, 80210, United States of America.
Department Of Neurology at Harvard Medical School, Harvard University, Massachusetts General Hospital, 55 Fruit St, Boston, Massachusetts, 02114, United States of America.
Expert Syst Appl. 2024 Mar 15;238(Pt B). doi: 10.1016/j.eswa.2023.121929. Epub 2023 Oct 4.
Deep machine learning models including Convolutional Neural Networks (CNN) have been successful in the detection of Mild Cognitive Impairment (MCI) using medical images, questionnaires, and videos. This paper proposes a novel Multi-branch Classifier-Video Vision Transformer (MC-ViViT) model to distinguish MCI from those with normal cognition by analyzing facial features. The data comes from the I-CONECT, a behavioral intervention trial aimed at improving cognitive function by providing frequent video chats. MC-ViViT extracts spatiotemporal features of videos in one branch and augments representations by the MC module. The I-CONECT dataset is challenging as the dataset is imbalanced containing Hard-Easy and Positive-Negative samples, which impedes the performance of MC-ViViT. We propose a loss function for Hard-Easy and Positive-Negative Samples (HP Loss) by combining Focal loss and AD-CORRE loss to address the imbalanced problem. Our experimental results on the I-CONECT dataset show the great potential of MC-ViViT in predicting MCI with a high accuracy of 90.63% accuracy on some of the interview videos.
包括卷积神经网络(CNN)在内的深度机器学习模型,在利用医学图像、问卷和视频检测轻度认知障碍(MCI)方面取得了成功。本文提出了一种新颖的多分支分类器-视频视觉变换器(MC-ViViT)模型,通过分析面部特征来区分MCI患者和认知正常者。数据来自I-CONECT,这是一项旨在通过频繁视频聊天改善认知功能的行为干预试验。MC-ViViT在一个分支中提取视频的时空特征,并通过MC模块增强特征表示。I-CONECT数据集具有挑战性,因为该数据集不均衡,包含难易样本和正负样本,这影响了MC-ViViT的性能。我们通过结合焦点损失和AD-CORRE损失,提出了一种针对难易样本和正负样本的损失函数(HP损失),以解决不均衡问题。我们在I-CONECT数据集上的实验结果表明,MC-ViViT在预测MCI方面具有巨大潜力,在一些访谈视频上的准确率高达90.63%。