Tang Duoxun, Jiang Xinhang, Wang Kunpeng, Guo Weichen, Zhang Jingyuan, Lin Ye, Pu Haibo
College of Science, Sichuan Agricultural University, Ya'an, 625000, China.
College of Information Engineering, Sichuan Agricultural University, Ya'an, 625000, China.
Sci Rep. 2024 Sep 28;14(1):22495. doi: 10.1038/s41598-024-72066-y.
The synthesis of facial sketch-photo has important applications in practical life, such as crime investigation. Many convolutional neural networks (CNNs) based methods have been proposed to address this issue. However, due to the substantial modal differences between sketch and photo, the CNN's insensitivity to global information, and insufficient utilization of hierarchical features, synthesized photos struggle to balance both identity preservation and image quality. Recently, State Space Sequence Models (SSMs) have achieved exciting results in computer vision (CV) tasks. Inspired by SSMs, we design a hybrid CNN-SSM model called FaceMamba for the Face Sketch-Photo Synthesis (FSPS) task. It includes an original Face Vision Mamba Attention for modeling in latent space using SSM. Additionally, it incorporates a general auxiliary method called Attention Feature Injection that combines encoding features, decoding features, and external auxiliary features using attention mechanisms. FaceMamba combines Mamba's modeling ability for long-range dependencies with CNN's powerful local feature extraction ability, and utilizes hierarchical features at the appropriate position. Adequate experimental and evaluation results reveal that FaceMamba has strong competitiveness in FSPS task, achieving the best balance between identity preservation and image quality.
面部素描-照片合成在实际生活中有重要应用,比如犯罪调查。已经提出了许多基于卷积神经网络(CNN)的方法来解决这个问题。然而,由于素描和照片之间存在显著的模态差异、CNN对全局信息不敏感以及对分层特征利用不足,合成照片难以在身份保留和图像质量之间取得平衡。最近,状态空间序列模型(SSM)在计算机视觉(CV)任务中取得了令人兴奋的成果。受SSM启发,我们为面部素描-照片合成(FSPS)任务设计了一种名为FaceMamba的混合CNN-SSM模型。它包括一个原始的面部视觉曼巴注意力机制,用于在潜在空间中使用SSM进行建模。此外,它还结合了一种名为注意力特征注入的通用辅助方法,该方法使用注意力机制将编码特征、解码特征和外部辅助特征相结合。FaceMamba将曼巴对长程依赖的建模能力与CNN强大的局部特征提取能力相结合,并在适当位置利用分层特征。充分的实验和评估结果表明,FaceMamba在FSPS任务中具有很强的竞争力,在身份保留和图像质量之间实现了最佳平衡。