Ahamed Md Atik, Cheng Qiang
Department of Computer Science, University of Kentucky, Lexington, KY, USA.
Institute for Biomedical Informatics University of Kentucky, Lexington, KY, USA.
Proc (IEEE Conf Multimed Inf Process Retr). 2024 Aug;2024:369-375. doi: 10.1109/mipr62202.2024.00065. Epub 2024 Oct 15.
Despite the prevalence of images and texts in machine learning, tabular data remains widely used across various domains. Existing deep learning models, such as convolutional neural networks and transformers, perform well however demand extensive preprocessing and tuning limiting accessibility and scalability. This work introduces an innovative approach based on a structured state-space model (SSM), MambaTab, for tabular data. SSMs have strong capabilities for efficiently extracting effective representations from data with long-range dependencies. MambaTab leverages Mamba, an emerging SSM variant, for end-to-end supervised learning on tables. Compared to state-of-the-art baselines, MambaTab delivers superior performance while requiring significantly fewer parameters, as empirically validated on diverse benchmark datasets. MambaTab's efficiency, scalability, generalizability, and predictive gains signify it as a lightweight, "plug-and-play" solution for diverse tabular data with promise for enabling wider practical applications.
尽管图像和文本在机器学习中很普遍,但表格数据在各个领域仍被广泛使用。现有的深度学习模型,如卷积神经网络和Transformer,表现良好,但需要大量的预处理和调优,限制了其可访问性和可扩展性。这项工作引入了一种基于结构化状态空间模型(SSM)的创新方法MambaTab来处理表格数据。状态空间模型具有强大的能力,能够有效地从具有长程依赖关系的数据中提取有效表示。MambaTab利用新兴的状态空间模型变体Mamba进行表格的端到端监督学习。与最先进的基线相比,MambaTab在需要显著更少参数的情况下提供了卓越的性能,这在各种基准数据集上得到了实证验证。MambaTab的效率、可扩展性、通用性和预测增益表明它是一种轻量级的“即插即用”解决方案,适用于各种表格数据,有望实现更广泛的实际应用。