Suppr超能文献

通过终身知识锚点实现不断进化的全自动机器学习。

Evolving Fully Automated Machine Learning via Life-Long Knowledge Anchors.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2021 Sep;43(9):3091-3107. doi: 10.1109/TPAMI.2021.3069250. Epub 2021 Aug 4.

Abstract

Automated machine learning (AutoML) has achieved remarkable progress on various tasks, which is attributed to its minimal involvement of manual feature and model designs. However, most of existing AutoML pipelines only touch parts of the full machine learning pipeline, e.g., neural architecture search or optimizer selection. This leaves potentially important components such as data cleaning and model ensemble out of the optimization, and still results in considerable human involvement and suboptimal performance. The main challenges lie in the huge search space assembling all possibilities over all components, as well as the generalization ability over different tasks like image, text, and tabular etc. In this paper, we present a first-of-its-kind fully AutoML pipeline, to comprehensively automate data preprocessing, feature engineering, model generation/selection/training and ensemble for an arbitrary dataset and evaluation metric. Our innovation lies in the comprehensive scope of a learning pipeline, with a novel "life-long" knowledge anchor design to fundamentally accelerate the search over the full search space. Such knowledge anchors record detailed information of pipelines and integrates them with an evolutionary algorithm for joint optimization across components. Experiments demonstrate that the result pipeline achieves state-of-the-art performance on multiple datasets and modalities. Specifically, the proposed framework was extensively evaluated in the NeurIPS 2019 AutoDL challenge, and won the only champion with a significant gap against other approaches, on all the image, video, speech, text and tabular tracks.

摘要

自动化机器学习(AutoML)在各种任务上取得了显著的进展,这归因于它在手动特征和模型设计方面的最小干预。然而,现有的大多数 AutoML 管道只触及了完整机器学习管道的部分内容,例如神经架构搜索或优化器选择。这使得数据清理和模型集成等潜在重要组件无法进行优化,并且仍然需要大量的人工参与和次优的性能。主要的挑战在于组装所有组件的所有可能性的巨大搜索空间,以及在不同任务(如图像、文本和表格等)上的泛化能力。在本文中,我们提出了一种首创的全自动化机器学习管道,以全面自动化任意数据集和评估指标的数据预处理、特征工程、模型生成/选择/训练和集成。我们的创新在于学习管道的全面范围,采用新颖的“终身”知识锚设计,从根本上加速整个搜索空间的搜索。这些知识锚记录了管道的详细信息,并将其与进化算法集成,实现组件之间的联合优化。实验表明,该结果管道在多个数据集和模态上实现了最先进的性能。具体来说,该框架在 2019 年神经信息处理系统大会(NeurIPS)的 AutoDL 挑战赛中进行了广泛评估,并在所有图像、视频、语音、文本和表格赛道上以显著优势击败了其他方法,获得了唯一的冠军。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验