Department of Computer Science City, University of London, United Kingdom.
Neural Netw. 2021 Oct;142:238-251. doi: 10.1016/j.neunet.2021.05.012. Epub 2021 May 14.
We introduce the novel concept of anti-transfer learning for speech processing with convolutional neural networks. While transfer learning assumes that the learning process for a target task will benefit from re-using representations learned for another task, anti-transfer avoids the learning of representations that have been learned for an orthogonal task, i.e., one that is not relevant and potentially confounding for the target task, such as speaker identity for speech recognition or speech content for emotion recognition. This extends the potential use of pre-trained models that have become increasingly available. In anti-transfer learning, we penalize similarity between activations of a network being trained on a target task and another one previously trained on an orthogonal task, which yields more suitable representations. This leads to better generalization and provides a degree of control over correlations that are spurious or undesirable, e.g. to avoid social bias. We have implemented anti-transfer for convolutional neural networks in different configurations with several similarity metrics and aggregation functions, which we evaluate and analyze with several speech and audio tasks and settings, using six datasets. We show that anti-transfer actually leads to the intended invariance to the orthogonal task and to more appropriate features for the target task at hand. Anti-transfer learning consistently improves classification accuracy in all test cases. While anti-transfer creates computation and memory cost at training time, there is relatively little computation cost when using pre-trained models for orthogonal tasks. Anti-transfer is widely applicable and particularly useful where a specific invariance is desirable or where labeled data for orthogonal tasks are difficult to obtain on a given dataset but pre-trained models are available.
我们提出了一种用于语音处理的卷积神经网络的新的反迁移学习概念。虽然迁移学习假设目标任务的学习过程将受益于重新使用为另一个任务学习的表示,但反迁移避免学习为正交任务(即与目标任务不相关且可能产生混淆的任务,例如说话人身份识别对于语音识别或语音内容对于情感识别)学习的表示。这扩展了越来越多的预训练模型的潜在用途。在反迁移学习中,我们惩罚在目标任务上训练的网络的激活与在以前在正交任务上训练的网络的激活之间的相似性,这产生了更合适的表示。这导致了更好的泛化,并提供了对虚假或不期望的相关性的一定程度的控制,例如避免社会偏见。我们已经用几种相似性度量和聚合函数在不同的配置中实现了反迁移用于卷积神经网络,我们使用六个数据集评估和分析了几种语音和音频任务和设置。我们表明,反迁移实际上导致了对正交任务的预期不变性,以及更适合手头目标任务的特征。反迁移学习在所有测试案例中都一致地提高了分类准确性。虽然反迁移在训练时会产生计算和内存成本,但在使用预训练模型进行正交任务时,计算成本相对较低。反迁移具有广泛的适用性,特别是在需要特定不变性的情况下,或者在给定数据集上难以获得正交任务的标记数据但可获得预训练模型的情况下非常有用。