He Zhu, Lin Mingwei, Luo Xin, Xu Zeshui
IEEE Trans Neural Netw Learn Syst. 2025 Jul;36(7):13021-13035. doi: 10.1109/TNNLS.2024.3490800.
The selection and utilization of different color spaces significantly impact the recognition performance of deep learning models in downstream tasks. Existing studies typically leverage image information from various color spaces through model integration or channel concatenation. However, these methods result in excessive model size and suboptimal utilization of image information. In this study, we propose the structure-preserved self-attention network (SPSANet) model for efficient fusion of image information from different color spaces. This model incorporates a novel structure-preserved self-attention (SPSA) module that employs a single-head pixel-wise attention mechanism, as opposed to the conventional multihead self-attention (MHSA) approach. Specifically, feature maps from all color space grouping paths are utilized for similarity matching, enabling the model to focus on critical pixel locations across different color spaces. This design mitigates the dependence of the SPSANet model on the choice of color space while enhancing the advantages of integrating multiple color spaces. The SPSANet model also employs channel shuffle operations to facilitate limited interaction between information flows from different color space paths. Experimental results demonstrate that the SPSANet model, utilizing eight common color spaces-RGB, Luv, XYZ, Lab, HSV, YCrCb, YUV, and HLS-achieves superior recognition performance with reduced parameters and computational cost.