Vecchi Edoardo, Bassetti Davide, Graziato Fabio, Pospíšil Lukáš, Horenko Illia
Università della Svizzera Italiana, Faculty of Informatics, Institute of Computing, 6962 Lugano, Switzerland
Technical University of Kaiserslautern, Faculty of Mathematics, Group of Mathematics of AI, 67663 Kaiserslautern, Germany
Neural Comput. 2024 May 10;36(6):1198-1227. doi: 10.1162/neco_a_01664.
Small data learning problems are characterized by a significant discrepancy between the limited number of response variable observations and the large feature space dimension. In this setting, the common learning tools struggle to identify the features important for the classification task from those that bear no relevant information and cannot derive an appropriate learning rule that allows discriminating among different classes. As a potential solution to this problem, here we exploit the idea of reducing and rotating the feature space in a lower-dimensional gauge and propose the gauge-optimal approximate learning (GOAL) algorithm, which provides an analytically tractable joint solution to the dimension reduction, feature segmentation, and classification problems for small data learning problems. We prove that the optimal solution of the GOAL algorithm consists in piecewise-linear functions in the Euclidean space and that it can be approximated through a monotonically convergent algorithm that presents-under the assumption of a discrete segmentation of the feature space-a closed-form solution for each optimization substep and an overall linear iteration cost scaling. The GOAL algorithm has been compared to other state-of-the-art machine learning tools on both synthetic data and challenging real-world applications from climate science and bioinformatics (i.e., prediction of the El Niño Southern Oscillation and inference of epigenetically induced gene-activity networks from limited experimental data). The experimental results show that the proposed algorithm outperforms the reported best competitors for these problems in both learning performance and computational cost.
小数据学习问题的特点是响应变量观测数量有限与特征空间维度较大之间存在显著差异。在这种情况下,常见的学习工具难以从那些不包含相关信息的特征中识别出对分类任务重要的特征,并且无法得出能够区分不同类别的合适学习规则。作为解决此问题的一种潜在方法,我们在此利用在低维规范中减少和旋转特征空间的思想,并提出规范最优近似学习(GOAL)算法,该算法为小数据学习问题的降维、特征分割和分类问题提供了一种解析上易于处理的联合解决方案。我们证明,GOAL算法的最优解在于欧几里得空间中的分段线性函数,并且它可以通过一种单调收敛算法来近似,该算法在特征空间离散分割的假设下,为每个优化子步骤提供闭式解以及整体线性迭代成本缩放。GOAL算法已在合成数据以及来自气候科学和生物信息学的具有挑战性的实际应用(即厄尔尼诺南方涛动的预测和从有限实验数据推断表观遗传诱导的基因活性网络)上与其他先进的机器学习工具进行了比较。实验结果表明,对于这些问题,所提出的算法在学习性能和计算成本方面均优于已报道的最佳竞争对手。