Eslamian Ali, Cheng Qiang
Department of Computer Science, University of Kentucky, 329 Rose Street, Lexington, Kentucky 40506, USA.
Institute for Biomedical Informatics, University of Kentucky, 800 Rose Street, Lexington, Kentucky 40506, USA.
Pattern Anal Appl. 2025 Jun;28(2). doi: 10.1007/s10044-025-01423-y. Epub 2025 Feb 21.
Tabular data, prevalent in relational databases and spreadsheets, is fundamental across fields like healthcare, engineering, and finance. Despite significant advances in tabular data learning, critical challenges remain: handling missing values, addressing class imbalance, enabling transfer learning, and facilitating feature incremental learning beyond traditional supervised paradigms. We introduce TabMixer, an innovative model that enhances the multilayer perceptron (MLP) mixer architecture to address these challenges. TabMixer incorporates a self-attention mechanism, making it versatile across various learning scenarios including supervised learning, transfer learning, and feature incremental learning. Extensive experiments on eight public datasets demonstrate TabMixer's superior performance over existing state-of-the-art methods. Notably, TabMixer achieved substantial improvements in ANOVA AUC across all scenarios: a 4% increase in supervised learning (0.840 to 0.881), 8% in transfer learning (0.803 to 0.872), and 4% in feature incremental learning (0.806 to 0.843). TabMixer demonstrates high computational efficiency and scalability through reduced floating-point operations and learnable parameters. Moreover, it exhibits strong resilience to missing values and class imbalances through both its architectural design and optional preprocessing enhancements. These results establish TabMixer as a promising model for tabular data analysis and a valuable tool for diverse applications.
表格数据在关系数据库和电子表格中很常见,在医疗保健、工程和金融等领域至关重要。尽管表格数据学习取得了重大进展,但仍存在关键挑战:处理缺失值、解决类别不平衡问题、实现迁移学习以及促进超越传统监督范式的特征增量学习。我们引入了TabMixer,这是一种创新模型,它增强了多层感知器(MLP)混合器架构以应对这些挑战。TabMixer集成了自注意力机制,使其在包括监督学习、迁移学习和特征增量学习在内的各种学习场景中都具有通用性。在八个公共数据集上进行的广泛实验表明,TabMixer比现有的最先进方法具有更优的性能。值得注意的是,TabMixer在所有场景下的方差分析AUC都有显著提升:在监督学习中提高了4%(从0.840提升至0.881),在迁移学习中提高了8%(从0.803提升至0.872),在特征增量学习中提高了4%(从0.806提升至0.843)。TabMixer通过减少浮点运算和可学习参数,展示了高计算效率和可扩展性。此外,通过其架构设计和可选的预处理增强,它对缺失值和类别不平衡具有很强的弹性。这些结果确立了TabMixer作为表格数据分析的一个有前景的模型以及用于各种应用的有价值工具。