Algorithmics Group, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands.
PLoS One. 2023 Feb 1;18(2):e0261029. doi: 10.1371/journal.pone.0261029. eCollection 2023.
Recent work has shown potential in using Mixed Integer Programming (MIP) solvers to optimize certain aspects of neural networks (NNs). However the intriguing approach of training NNs with MIP solvers is under-explored. State-of-the-art-methods to train NNs are typically gradient-based and require significant data, computation on GPUs, and extensive hyper-parameter tuning. In contrast, training with MIP solvers does not require GPUs or heavy hyper-parameter tuning, but currently cannot handle anything but small amounts of data. This article builds on recent advances that train binarized NNs using MIP solvers. We go beyond current work by formulating new MIP models which improve training efficiency and which can train the important class of integer-valued neural networks (INNs). We provide two novel methods to further the potential significance of using MIP to train NNs. The first method optimizes the number of neurons in the NN while training. This reduces the need for deciding on network architecture before training. The second method addresses the amount of training data which MIP can feasibly handle: we provide a batch training method that dramatically increases the amount of data that MIP solvers can use to train. We thus provide a promising step towards using much more data than before when training NNs using MIP models. Experimental results on two real-world data-limited datasets demonstrate that our approach strongly outperforms the previous state of the art in training NN with MIP, in terms of accuracy, training time and amount of data. Our methodology is proficient at training NNs when minimal training data is available, and at training with minimal memory requirements-which is potentially valuable for deploying to low-memory devices.
最近的工作表明,使用混合整数规划(MIP)求解器来优化神经网络(NN)的某些方面具有潜力。然而,使用 MIP 求解器训练神经网络的这种有趣方法还没有得到充分的探索。训练神经网络的最新方法通常是基于梯度的,需要大量的数据、GPU 上的计算和广泛的超参数调整。相比之下,使用 MIP 求解器进行训练不需要 GPU 或大量的超参数调整,但目前只能处理少量的数据。本文基于使用 MIP 求解器训练二值神经网络的最新进展。我们通过制定新的 MIP 模型来超越当前的工作,这些模型可以提高训练效率,并可以训练重要的整数神经网络(INN)。我们提供了两种新的方法来进一步提高使用 MIP 训练神经网络的潜在意义。第一种方法是在训练过程中优化神经网络中的神经元数量。这减少了在训练前决定网络架构的需要。第二种方法解决了 MIP 可以处理的训练数据量的问题:我们提供了一种批量训练方法,可以显著增加 MIP 求解器可以用于训练的数据量。因此,我们朝着使用比以前更多的数据来训练使用 MIP 模型的神经网络迈出了有希望的一步。在两个真实的数据集上的实验结果表明,在使用 MIP 训练 NN 方面,我们的方法在准确性、训练时间和数据量方面都明显优于之前的最先进方法。当可用的训练数据很少时,我们的方法能够熟练地训练神经网络,并且需要最小的内存要求-这对于部署到低内存设备来说是潜在有价值的。