Suppr超能文献

使用混合CNN-Mamba框架在面部素描-照片合成中实现身份保留。

Toward identity preserving in face sketch-photo synthesis using a hybrid CNN-Mamba framework.

作者信息

Tang Duoxun, Jiang Xinhang, Wang Kunpeng, Guo Weichen, Zhang Jingyuan, Lin Ye, Pu Haibo

机构信息

College of Science, Sichuan Agricultural University, Ya'an, 625000, China.

College of Information Engineering, Sichuan Agricultural University, Ya'an, 625000, China.

出版信息

Sci Rep. 2024 Sep 28;14(1):22495. doi: 10.1038/s41598-024-72066-y.

Abstract

The synthesis of facial sketch-photo has important applications in practical life, such as crime investigation. Many convolutional neural networks (CNNs) based methods have been proposed to address this issue. However, due to the substantial modal differences between sketch and photo, the CNN's insensitivity to global information, and insufficient utilization of hierarchical features, synthesized photos struggle to balance both identity preservation and image quality. Recently, State Space Sequence Models (SSMs) have achieved exciting results in computer vision (CV) tasks. Inspired by SSMs, we design a hybrid CNN-SSM model called FaceMamba for the Face Sketch-Photo Synthesis (FSPS) task. It includes an original Face Vision Mamba Attention for modeling in latent space using SSM. Additionally, it incorporates a general auxiliary method called Attention Feature Injection that combines encoding features, decoding features, and external auxiliary features using attention mechanisms. FaceMamba combines Mamba's modeling ability for long-range dependencies with CNN's powerful local feature extraction ability, and utilizes hierarchical features at the appropriate position. Adequate experimental and evaluation results reveal that FaceMamba has strong competitiveness in FSPS task, achieving the best balance between identity preservation and image quality.

摘要

面部素描-照片合成在实际生活中有重要应用,比如犯罪调查。已经提出了许多基于卷积神经网络(CNN)的方法来解决这个问题。然而,由于素描和照片之间存在显著的模态差异、CNN对全局信息不敏感以及对分层特征利用不足,合成照片难以在身份保留和图像质量之间取得平衡。最近,状态空间序列模型(SSM)在计算机视觉(CV)任务中取得了令人兴奋的成果。受SSM启发,我们为面部素描-照片合成(FSPS)任务设计了一种名为FaceMamba的混合CNN-SSM模型。它包括一个原始的面部视觉曼巴注意力机制,用于在潜在空间中使用SSM进行建模。此外,它还结合了一种名为注意力特征注入的通用辅助方法,该方法使用注意力机制将编码特征、解码特征和外部辅助特征相结合。FaceMamba将曼巴对长程依赖的建模能力与CNN强大的局部特征提取能力相结合,并在适当位置利用分层特征。充分的实验和评估结果表明,FaceMamba在FSPS任务中具有很强的竞争力,在身份保留和图像质量之间实现了最佳平衡。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b97d/11438986/0f2df2f5f6f8/41598_2024_72066_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验