Madusanka Nuwan, Malekroodi Hadi Sedigh, Herath H M K K M B, Hewage Chaminda, Yi Myunggi, Lee Byeong-Il
Digital Healthcare Research Center, Pukyong National University, Busan 48513, Republic of Korea.
Industry 4.0 Convergence Bionics Engineering, Pukyoung National University, Busan 48513, Republic of Korea.
J Imaging. 2025 Jul 2;11(7):220. doi: 10.3390/jimaging11070220.
This study presents a novel framework that integrates Vision Graph Neural Networks (ViGs) with supervised contrastive learning for enhanced spectro-temporal image analysis of speech signals in Parkinson's disease (PD) detection. The approach introduces a frequency band decomposition strategy that transforms raw audio into three complementary spectral representations, capturing distinct PD-specific characteristics across low-frequency (0-2 kHz), mid-frequency (2-6 kHz), and high-frequency (6 kHz+) bands. The framework processes mel multi-band spectro-temporal representations through a ViG architecture that models complex graph-based relationships between spectral and temporal components, trained using a supervised contrastive objective that learns discriminative representations distinguishing PD-affected from healthy speech patterns. Comprehensive experimental validation on multi-institutional datasets from Italy, Colombia, and Spain demonstrates that the proposed ViG-contrastive framework achieves superior classification performance, with the ViG-M-GELU architecture achieving 91.78% test accuracy. The integration of graph neural networks with contrastive learning enables effective learning from limited labeled data while capturing complex spectro-temporal relationships that traditional Convolution Neural Network (CNN) approaches miss, representing a promising direction for developing more accurate and clinically viable speech-based diagnostic tools for PD.
本研究提出了一种新颖的框架,该框架将视觉图神经网络(ViGs)与监督对比学习相结合,用于帕金森病(PD)检测中语音信号的增强频谱-时间图像分析。该方法引入了一种频带分解策略,将原始音频转换为三种互补的频谱表示,捕捉低频(0-2kHz)、中频(2-6kHz)和高频(6kHz以上)频段中不同的帕金森病特异性特征。该框架通过ViG架构处理梅尔多频带频谱-时间表示,该架构对频谱和时间成分之间基于图的复杂关系进行建模,并使用监督对比目标进行训练,该目标学习区分帕金森病影响的语音模式和健康语音模式的判别表示。对来自意大利、哥伦比亚和西班牙的多机构数据集进行的全面实验验证表明,所提出的ViG对比框架实现了卓越的分类性能,其中ViG-M-GELU架构的测试准确率达到91.78%。图神经网络与对比学习的集成能够从有限的标记数据中进行有效学习,同时捕捉传统卷积神经网络(CNN)方法遗漏的复杂频谱-时间关系,这代表了为帕金森病开发更准确且临床上可行的基于语音的诊断工具的一个有前景的方向。