Dentamaro Vincenzo, Giglio Paolo, Impedovo Donato, Pirlo Giuseppe, Ciano Marco Di
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6995-7009. doi: 10.1109/TNNLS.2024.3392355. Epub 2025 Apr 4.
Deep learning (DL) has been demonstrated to be a valuable tool for analyzing signals such as sounds and images, thanks to its capabilities of automatically extracting relevant patterns as well as its end-to-end training properties. When applied to tabular structured data, DL has exhibited some performance limitations compared to shallow learning techniques. This work presents a novel technique for tabular data called adaptive multiscale attention deep neural network architecture (also named excited attention). By exploiting parallel multilevel feature weighting, the adaptive multiscale attention can successfully learn the feature attention and thus achieve high levels of F1-score on seven different classification tasks (on small, medium, large, and very large datasets) and low mean absolute errors on four regression tasks of different size. In addition, adaptive multiscale attention provides four levels of explainability (i.e., comprehension of its learning process and therefore of its outcomes): 1) calculates attention weights to determine which layers are most important for given classes; 2) shows each feature's attention across all instances; 3) understands learned feature attention for each class to explore feature attention and behavior for specific classes; and 4) finds nonlinear correlations between co-behaving features to reduce dataset dimensionality and improve interpretability. These interpretability levels, in turn, allow for employing adaptive multiscale attention as a useful tool for feature ranking and feature selection.
深度学习(DL)已被证明是分析声音和图像等信号的宝贵工具,这得益于其自动提取相关模式的能力及其端到端的训练特性。当应用于表格结构数据时,与浅层学习技术相比,深度学习表现出一些性能限制。这项工作提出了一种用于表格数据的新技术,称为自适应多尺度注意力深度神经网络架构(也称为激发注意力)。通过利用并行多级特征加权,自适应多尺度注意力能够成功学习特征注意力,从而在七个不同的分类任务(在小、中、大、非常大的数据集上)上实现高水平的F1分数,并在四个不同大小的回归任务上实现低平均绝对误差。此外,自适应多尺度注意力提供了四个层次的可解释性(即对其学习过程以及结果的理解):1)计算注意力权重以确定哪些层对给定类别最重要;2)展示所有实例中每个特征的注意力;3)了解每个类别的学习特征注意力,以探索特定类别的特征注意力和行为;4)找到共同行为特征之间的非线性相关性,以降低数据集维度并提高可解释性。这些可解释性层次进而允许将自适应多尺度注意力用作特征排序和特征选择的有用工具。