Dong Qifei, Zhang Xiaoyi, Luo Gang
Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195, USA.
IEEE Access. 2022;10:63754-63781. doi: 10.1109/access.2022.3181493. Epub 2022 Jun 8.
For many machine learning tasks, deep learning greatly outperforms all other existing learning algorithms. However, constructing a deep learning model on a big data set often takes days or months. During this long process, it is preferable to provide a progress indicator that keeps predicting the model construction time left and the percentage of model construction work done. Recently, we developed the first method to do this that permits early stopping. That method revises its predicted model construction cost using information gathered at the validation points, where the model's error rate is computed on the validation set. Due to the sparsity of validation points, the resulting progress indicators often have a long delay in gathering information from enough validation points and obtaining relatively accurate progress estimates. In this paper, we propose a new progress indication method to overcome this shortcoming by judiciously inserting extra validation points between the original validation points. We implemented this new method in TensorFlow. Our experiments show that compared with using our prior method, using this new method reduces the progress indicator's prediction error of the model construction time left by 57.5% on average. Also, with a low overhead, this new method enables us to obtain relatively accurate progress estimates faster.
对于许多机器学习任务而言,深度学习的表现远远优于所有其他现有的学习算法。然而,在大数据集上构建深度学习模型通常需要数天或数月的时间。在这个漫长的过程中,最好能提供一个进度指示器,持续预测剩余的模型构建时间以及已完成的模型构建工作的百分比。最近,我们开发了第一种能够做到这一点且允许提前停止的方法。该方法利用在验证点收集到的信息来修正其预测的模型构建成本,在验证点会在验证集上计算模型的错误率。由于验证点的稀疏性,由此产生的进度指示器在从足够多的验证点收集信息并获得相对准确的进度估计方面往往存在长时间的延迟。在本文中,我们提出了一种新的进度指示方法,通过在原始验证点之间明智地插入额外的验证点来克服这一缺点。我们在TensorFlow中实现了这种新方法。我们的实验表明,与使用我们之前的方法相比,使用这种新方法平均将剩余模型构建时间的进度指示器预测误差降低了57.5%。此外,这种新方法开销较低,使我们能够更快地获得相对准确的进度估计。