Alrabie Sami, Barnawi Ahmed
Faculty of Computing and Information Technology (FCIT), King Abdulaziz University, Jeddah 21589, Saudi Arabia.
Bioengineering (Basel). 2025 May 22;12(6):558. doi: 10.3390/bioengineering12060558.
In recent years, deep learning has shown promise in automating heart-sound classification. Although this approach is fast, non-invasive, and cost-effective, its diagnostic accuracy still mainly depends on the clinician's expertise, making it particularly challenging to detect rare or complex conditions. This study is motivated by two key concerns in the field of heart-sound classification. First, we observed that automatic heart-sound segmentation algorithms-commonly used for data augmentation-produce varying outcomes, raising concerns about the accuracy of both the segmentation process and the resulting classification performance. Second, we noticed inconsistent accuracy scores across different pretrained models, prompting the need for interpretable explanations to validate these results. We argue that without interpretability to support reported metrics, accuracy scores can be misleading because of ambiguity in how training data interact with pretrained models. Specifically, it remains unclear whether these models classify spectrogram images-generated from heart-sound signals-in a way that aligns with clinical reasoning, where experts focus on specific components of the heart cycle, such as S1, systole, S2, and diastole. To address this, we applied explainable AI (XAI) techniques with two primary objectives: (1) to assess whether the model truly focuses on clinically relevant features, thereby allowing classification results to be verified and trusted, and (2) to investigate whether incorporating attention mechanisms can improve both the performance and the model's focus on meaningful segments of the signal. To the best of our knowledge, this is the first study conducted on a manually segmented dataset, which objectively evaluates the model's behavior using XAI and explores performance enhancement by combining attention mechanisms with pretrained models. We employ the Grad-CAM method to visualize the model's attention and gain insights into the decision-making process. The experimental results show that integrating multi-head attention significantly improves both the classification accuracy and interpretability. Notably, ResNet50 with multi-head attention achieved an accuracy of 97.3%, outperforming those of both the baseline and SE-enhanced models. Moreover, the mean intersection over union (mIoU) for interpretability increased from 75.7% to 82.0%, indicating the model's improved focus on diagnostically relevant regions.
近年来,深度学习在实现心音分类自动化方面展现出了潜力。尽管这种方法快速、无创且具有成本效益,但其诊断准确性仍主要依赖于临床医生的专业知识,这使得检测罕见或复杂病症极具挑战性。本研究受到心音分类领域两个关键问题的推动。首先,我们观察到常用于数据增强的自动心音分割算法产生的结果各不相同,这引发了对分割过程准确性以及由此产生的分类性能的担忧。其次,我们注意到不同预训练模型的准确率得分不一致,这促使需要可解释的解释来验证这些结果。我们认为,如果没有可解释性来支持所报告的指标,准确率得分可能会产生误导,因为训练数据与预训练模型的交互方式存在模糊性。具体而言,目前尚不清楚这些模型对从心音信号生成的频谱图图像进行分类的方式是否与临床推理一致,临床专家关注心动周期的特定组成部分,如S1、收缩期、S2和舒张期。为了解决这个问题,我们应用了可解释人工智能(XAI)技术,主要有两个目标:(1)评估模型是否真正关注临床相关特征,从而使分类结果能够得到验证和信任;(2)研究纳入注意力机制是否能提高性能以及模型对信号有意义片段的关注程度。据我们所知,这是第一项在手动分割数据集上进行的研究,该研究使用XAI客观评估模型的行为,并探索通过将注意力机制与预训练模型相结合来提高性能。我们采用Grad-CAM方法来可视化模型的注意力,并深入了解决策过程。实验结果表明,集成多头注意力显著提高了分类准确率和可解释性。值得注意的是,带有多头注意力的ResNet50实现了97.3%的准确率,优于基线模型和SE增强模型。此外,可解释性的平均交并比(mIoU)从75.7%提高到了82.0%,表明模型对诊断相关区域的关注有所改善。