Jin Jin, Wang Fan, Tian Shengzheng
School of Information and Intelligent Engineering, Zhejiang Wanli University, Ningbo, Zhejiang, China.
ZHONGTIETONG Rail Transit Operation Co. Ltd., Wenzhou, Zhejiang, China.
PLoS One. 2025 Sep 3;20(9):e0330684. doi: 10.1371/journal.pone.0330684. eCollection 2025.
Multi-modal classification aims to extract pertinent information from various modalities to assign labels to instances. The advent of deep neural networks has significantly advanced this task. However, the majority of current deep neural networks lack interpretability, leading to skepticism. This issue is particularly pronounced in sensitive domains such as educational assessment. In order to address the trust deficit in deep neural networks for multi-modal classification tasks, we propose an Interpretable Multi-modal Classification framework (ICMC), which enhances confidence in the processes and outcomes of deep neural networks while maintaining interpretability and improving performance. Specifically, our approach incorporates a confidence-driven attention mechanism at the intermediate layer of the deep neural network, assessing attention scores and discerning anomalous information from both local and global perspectives. Furthermore, a confidence probability mechanism is implemented at the output layer, leveraging both local and global perspectives to bolster result confidence. Additionally, we meticulously curate multi-modal datasets for automatic lesson plan scoring research, making them openly available to the community. Quantitative experiments on educational and medical datasets confirm that ICMC outperforms state-of-the-art models (HMCAN, MCAN, HGLNet) by 2.5-6.0% in accuracy and 3.1-7.2% in F1-score, while reducing computational latency by 18%. Cross-domain validation demonstrates 15.7% higher generalizability than transformer-based approaches (CLIP), establishing its interpretability through attention visualization and confidence scoring.
多模态分类旨在从各种模态中提取相关信息,以便为实例分配标签。深度神经网络的出现显著推动了这项任务的发展。然而,当前大多数深度神经网络缺乏可解释性,引发了质疑。这个问题在教育评估等敏感领域尤为突出。为了解决多模态分类任务中深度神经网络的信任赤字问题,我们提出了一个可解释的多模态分类框架(ICMC),该框架在保持可解释性并提高性能的同时,增强了对深度神经网络过程和结果的信心。具体而言,我们的方法在深度神经网络的中间层引入了一种置信度驱动的注意力机制,从局部和全局角度评估注意力分数并识别异常信息。此外,在输出层实现了一种置信概率机制,利用局部和全局视角来增强结果的置信度。此外,我们精心策划了用于自动教案评分研究的多模态数据集,并向社区公开提供这些数据集。在教育和医疗数据集上的定量实验证实,ICMC在准确率上比现有最先进的模型(HMCAN、MCAN、HGLNet)高出2.5 - 6.0%,在F1分数上高出3.1 - 7.2%,同时将计算延迟降低了18%。跨域验证表明,其泛化能力比基于Transformer的方法(CLIP)高15.7%,通过注意力可视化和置信度评分确立了其可解释性。