School of Engineering, RMIT University, Melbourne VIC 3001, Australia.
School of Science, RMIT University, Melbourne VIC 3001, Australia.
Neural Netw. 2020 Aug;128:345-357. doi: 10.1016/j.neunet.2020.05.011. Epub 2020 May 18.
Continual learning is the ability of a learning system to solve new tasks by utilizing previously acquired knowledge from learning and performing prior tasks without having significant adverse effects on the acquired prior knowledge. Continual learning is key to advancing machine learning and artificial intelligence. Progressive learning is a deep learning framework for continual learning that comprises three procedures: curriculum, progression, and pruning. The curriculum procedure is used to actively select a task to learn from a set of candidate tasks. The progression procedure is used to grow the capacity of the model by adding new parameters that leverage parameters learned in prior tasks, while learning from data available for the new task at hand, without being susceptible to catastrophic forgetting. The pruning procedure is used to counteract the growth in the number of parameters as further tasks are learned, as well as to mitigate negative forward transfer, in which prior knowledge unrelated to the task at hand may interfere and worsen performance. Progressive learning is evaluated on a number of supervised classification tasks in the image recognition and speech recognition domains to demonstrate its advantages compared with baseline methods. It is shown that, when tasks are related, progressive learning leads to faster learning that converges to better generalization performance using a smaller number of dedicated parameters.
持续学习是指学习系统利用从学习和执行先前任务中获得的先前知识来解决新任务的能力,而不会对先前获得的知识产生重大不利影响。持续学习是推动机器学习和人工智能发展的关键。渐进式学习是一种持续学习的深度学习框架,包括三个过程:课程、进展和修剪。课程过程用于从一组候选任务中主动选择要学习的任务。进展过程用于通过添加新的参数来增加模型的容量,这些新的参数利用了在先前任务中学到的参数,同时从当前手头的新任务可用的数据中学习,而不会受到灾难性遗忘的影响。修剪过程用于抵消随着进一步学习任务而增加的参数数量,以及减轻负向前转移,其中与当前任务无关的先验知识可能会干扰并降低性能。渐进式学习在图像识别和语音识别领域的多个监督分类任务上进行了评估,以展示其与基线方法相比的优势。结果表明,当任务相关时,渐进式学习可以更快地学习,使用较少的专用参数收敛到更好的泛化性能。