Cheung Tsz-Him, Yeung Dit-Yan
IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):13185-13205. doi: 10.1109/TNNLS.2023.3282258. Epub 2024 Oct 7.
Data augmentation is an effective way to improve the generalization of deep learning models. However, the underlying augmentation methods mainly rely on handcrafted operations, such as flipping and cropping for image data. These augmentation methods are often designed based on human expertise or repeated trials. Meanwhile, automated data augmentation (AutoDA) is a promising research direction that frames the data augmentation process as a learning task and finds the most effective way to augment the data. In this survey, we categorize recent AutoDA methods into the composition-, mixing-, and generation-based approaches and analyze each category in detail. Based on the analysis, we discuss the challenges and future prospects as well as provide guidelines for applying AutoDA methods by considering the dataset, computation effort, and availability of domain-specific transformations. It is hoped that this article can provide a useful list of AutoDA methods and guidelines for data partitioners when deploying AutoDA in practice. The survey can also serve as a reference for further study by researchers in this emerging research area.
数据增强是提高深度学习模型泛化能力的有效方法。然而,底层的增强方法主要依赖手工操作,如图像数据的翻转和裁剪。这些增强方法通常基于人类专业知识或反复试验来设计。同时,自动数据增强(AutoDA)是一个很有前景的研究方向,它将数据增强过程构建为一个学习任务,并找到增强数据的最有效方法。在本次综述中,我们将近期的AutoDA方法分为基于组合、混合和生成的方法,并对每一类进行详细分析。基于该分析,我们讨论了挑战和未来前景,并通过考虑数据集、计算量和特定领域变换的可用性,为应用AutoDA方法提供了指导原则。希望本文能为在实践中部署AutoDA时的数据划分人员提供一份有用的AutoDA方法列表和指导原则。本综述也可为该新兴研究领域的研究人员进一步研究提供参考。