Zhang Yexun, Zhang Ya, Cai Wenbin
IEEE Trans Image Process. 2020 Jan 31. doi: 10.1109/TIP.2020.2969081.
Image style transfer has drawn broad attention recently. However, most existing methods aim to explicitly model the transformation between different styles, and the learned model is often not generalizable to new styles. Based on the idea of style and content separation, we here propose a unified style transfer framework that consists of style encoder, content encoder, mixer and decoder. The style encoder and the content encoder are used to extract the style and content representations from the corresponding reference images. The two representations are integrated by the mixer and fed to the decoder, which generates images with the target style and content. Assuming the same encoder could be shared among different styles/contents, the style/content encoder explores a generalizable way to represent style/content information, i.e. the encoders are expected to capture the underlying representation for different styles/contents and generalize to new styles/contents. Training simultaneously with a number of styles and contents, the framework enables building one single transfer network for multiple styles and further leads to a key merit of the framework, i.e. its generalizability to new styles and contents. To evaluate the proposed framework, we apply it to both supervised and unsupervised style transfer, using character typeface transfer and neural style transfer as respective examples. For character typeface transfer, to separate the style features and content features, we leverage the conditional dependence of styles and contents given an image. For neural style transfer, we leverage the statistical information of feature maps in certain layers to represent style. Extensive experimental results have demonstrated the effectiveness and robustness of the proposed methods. Furthermore, models learned under the proposed framework are shown to be better generalizable to new styles and contents.
图像风格迁移最近引起了广泛关注。然而,大多数现有方法旨在显式地对不同风格之间的转换进行建模,而所学习的模型通常不能推广到新的风格。基于风格与内容分离的思想,我们在此提出一个统一的风格迁移框架,它由风格编码器、内容编码器、混合器和解码器组成。风格编码器和内容编码器用于从相应的参考图像中提取风格和内容表示。这两种表示由混合器进行整合,并馈送到解码器,解码器生成具有目标风格和内容的图像。假设相同的编码器可以在不同的风格/内容之间共享,风格/内容编码器探索一种可推广的方式来表示风格/内容信息,即期望编码器捕获不同风格/内容的潜在表示并推广到新的风格/内容。该框架通过与多种风格和内容同时进行训练,能够构建一个针对多种风格的单一迁移网络,进而带来该框架的一个关键优点,即其对新风格和内容的可推广性。为了评估所提出的框架,我们将其应用于有监督和无监督的风格迁移,分别以字符字体迁移和神经风格迁移为例。对于字符字体迁移,为了分离风格特征和内容特征,我们利用给定图像时风格和内容的条件依赖性。对于神经风格迁移,我们利用某些层中特征图的统计信息来表示风格。大量实验结果证明了所提方法的有效性和鲁棒性。此外,在所提出的框架下学习的模型被证明对新风格和内容具有更好的可推广性。