Suppr超能文献

PWLU:使用分段线性单元学习特定激活函数

PWLU: Learning Specialized Activation Functions With the Piecewise Linear Unit.

作者信息

Zhu Zezhou, Zhou Yucong, Dong Yuan, Zhong Zhao

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12269-12286. doi: 10.1109/TPAMI.2023.3286109. Epub 2023 Sep 5.

Abstract

The choice of activation functions is crucial to deep neural networks. ReLU is a popular hand-designed activation function. Swish, the automatically searched activation function, outperforms ReLU on many challenging datasets. However, the search method has two main drawbacks. First, the tree-based search space is highly discrete and restricted, which is difficult to search. Second, the sample-based search method is inefficient in finding specialized activation functions for each dataset or neural architecture. To overcome these drawbacks, we propose a new activation function called Piecewise Linear Unit (PWLU), incorporating a carefully designed formulation and learning method. PWLU can learn specialized activation functions for different models, layers, or channels. Besides, we propose a non-uniform version of PWLU, which maintains sufficient flexibility but requires fewer intervals and parameters. Additionally, we generalize PWLU to three-dimensional space to define a piecewise linear surface named 2D-PWLU, which can be treated as a non-linear binary operator. Experimental results show that PWLU achieves SOTA performance on various tasks and models, and 2D-PWLU is better than element-wise addition when aggregating features from different branches. The proposed PWLU and its variation are easy to implement and efficient for inference, which can be widely applied in real-world applications.

摘要

激活函数的选择对深度神经网络至关重要。ReLU是一种流行的手工设计激活函数。自动搜索的激活函数Swish在许多具有挑战性的数据集上优于ReLU。然而,搜索方法有两个主要缺点。首先,基于树的搜索空间高度离散且受限,难以搜索。其次,基于样本的搜索方法在为每个数据集或神经架构寻找专门的激活函数时效率低下。为了克服这些缺点,我们提出了一种名为分段线性单元(PWLU)的新激活函数,它结合了精心设计的公式和学习方法。PWLU可以为不同的模型、层或通道学习专门的激活函数。此外,我们提出了PWLU的非均匀版本,它保持了足够的灵活性,但需要的区间和参数更少。此外,我们将PWLU推广到三维空间,定义了一个名为2D-PWLU的分段线性曲面,它可以被视为一个非线性二元运算符。实验结果表明,PWLU在各种任务和模型上实现了最优性能,并且在聚合来自不同分支的特征时,2D-PWLU比逐元素加法更好。所提出的PWLU及其变体易于实现且推理效率高,可广泛应用于实际应用中。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验