一种基于Transformer和Mamba的混合模型，用于增强序列建模。

A hybrid model based on transformer and Mamba for enhanced sequence modeling.

作者信息

Zhu Xiaocui, Ruan Qunsheng, Qian Sai, Zhang Miaohui

机构信息

Jiangxi Academy Sciences, Institute of Energy, Nanchang, 330029, Jiangxi, China.

Department of nature science and computer, Ganzhou Teachers College, Ganzhou, 341000, Jiangxi, China.

出版信息

Sci Rep. 2025 Apr 3;15(1):11428. doi: 10.1038/s41598-025-87574-8.

DOI:10.1038/s41598-025-87574-8

PMID:40180947

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11968869/

Abstract

State Space Models (SSMs) have made remarkable strides in language modeling in recent years. With the introduction of Mamba, these models have garnered increased attention, often surpassing Transformers in specific areas. Nevertheless, despite Mamba's unique strengths, Transformers remain essential due to their advanced computational capabilities and proven effectiveness. In this paper, we propose a novel model that effectively integrates the strengths of both Transformers and Mamba. Specifically, our model utilizes the Transformer's encoder for encoding tasks while employing Mamba as the decoder for decoding tasks. We introduce a feature fusion technique that combines the features generated by the encoder with the hidden states produced by the decoder. This approach successfully merges the advantages of the Transformer and Mamba, resulting in enhanced performance. Comprehensive experiments across various language tasks demonstrate that our proposed model consistently achieves competitive results, outperforming existing benchmarks.

摘要

近年来，状态空间模型（SSMs）在语言建模方面取得了显著进展。随着曼巴（Mamba）的引入，这些模型受到了越来越多的关注，在特定领域常常超越了Transformer。然而，尽管曼巴有其独特的优势，但由于Transformer先进的计算能力和已被证明的有效性，它们仍然至关重要。在本文中，我们提出了一种新颖的模型，该模型有效地整合了Transformer和曼巴的优势。具体而言，我们的模型在编码任务中使用Transformer的编码器，同时在解码任务中采用曼巴作为解码器。我们引入了一种特征融合技术，将编码器生成的特征与解码器产生的隐藏状态相结合。这种方法成功地融合了Transformer和曼巴的优点，从而提高了性能。在各种语言任务上进行的全面实验表明，我们提出的模型始终能取得有竞争力的结果，优于现有的基准。