Suppr超能文献

我们是否准备好迎接新的范式转变?关于视觉深度多层感知器的一项调查。

Are we ready for a new paradigm shift? A survey on visual deep MLP.

作者信息

Liu Ruiyang, Li Yinghui, Tao Linmi, Liang Dun, Zheng Hai-Tao

机构信息

Department of Computer Science and Technology, BNRist, Tsinghua University & Key Lab of Pervasive Computing, Ministry of Education of China, Beijing 100084, China.

Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China.

出版信息

Patterns (N Y). 2022 Jul 8;3(7):100520. doi: 10.1016/j.patter.2022.100520.

Abstract

Recently, the proposed deep multilayer perceptron (MLP) models have stirred up a lot of interest in the vision community. Historically, the availability of larger datasets combined with increased computing capacity led to paradigm shifts. This review provides detailed discussions on whether MLPs can be a new paradigm for computer vision. We compare the intrinsic connections and differences between convolution, self-attention mechanism, and token-mixing MLP in detail. Advantages and limitations of token-mixing MLP are provided, followed by careful analysis of recent MLP-like variants, from module design to network architecture, and their applications. In the graphics processing unit era, the locally and globally weighted summations are the current mainstreams, represented by the convolution and self-attention mechanism, as well as MLPs. We suggest the further development of the paradigm to be considered alongside the next-generation computing devices.

摘要

最近,提出的深度多层感知器(MLP)模型在视觉领域引起了广泛关注。从历史上看,更大数据集的可用性与计算能力的提升导致了范式的转变。本综述详细讨论了MLP是否能成为计算机视觉的新范式。我们详细比较了卷积、自注意力机制和令牌混合MLP之间的内在联系和差异。提供了令牌混合MLP的优点和局限性,随后仔细分析了从模块设计到网络架构的近期类MLP变体及其应用。在图形处理单元时代,局部和全局加权求和是当前主流,以卷积、自注意力机制以及MLP为代表。我们建议在考虑下一代计算设备的同时,对该范式进行进一步发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9657/9278509/898830b34283/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验