Siriborvornratanakul Thitirat
Graduate School of Applied Statistics, National Institute of Development Administration (NIDA), 148 SeriThai Rd., Bangkapi, Bangkok, 10240 Thailand.
J Big Data. 2022;9(1):96. doi: 10.1186/s40537-022-00646-8. Epub 2022 Jul 20.
The emergence of automated machine learning or AutoML has raised an interesting trend of no-code and low-code machine learning where most tasks in the machine learning pipeline can possibly be automated without support from human data scientists. While it sounds reasonable that we should leave repetitive trial-and-error tasks of designing complex network architectures and tuning a lot of hyperparameters to AutoML, leading research using AutoML is still scarce. Thereby, the overall purpose of this case study is to investigate the gap between current AutoML frameworks and practical machine learning development.
First, this paper confirms the increasing trend of AutoML via an indirect indicator of the numbers of search results in Google trend, IEEE Xplore, and ACM Digital Library during 2012-2021. Then, the three most popular AutoML frameworks (i.e., Auto-Sklearn, AutoKeras, and Google Cloud AutoML) are inspected as AutoML's representatives; the inspection includes six comparative aspects. Based on the features available in the three AutoML frameworks investigated, our case study continues to observe recent machine learning research regarding the background of image-based machine learning. This is because the field of computer vision spans several levels of machine learning from basic to advanced and it has been one of the most popular fields in studying machine learning and artificial intelligence lately. Our study is specific to the context of image-based road health inspection systems as it has a long history in computer vision, allowing us to observe solution transitions from past to present.
After confirming the rising numbers of AutoML search results in the three search engines, our study regarding the three AutoML representatives further reveals that there are many features that can be used to automate the development pipeline of image-based road health inspection systems. Nevertheless, we find that recent works in image-based road health inspection have not used any form of AutoML in their works. Digging into these recent works, there are two main problems that best conclude why most researchers do not use AutoML in their image-based road health inspection systems yet. Firstly, it is because AutoML's trial-and-error decision involves much extra computation compared to human-guided decisions. Secondly, using AutoML adds another layer of non-interpretability to a model. As these two problems are the major pain points in modern neural networks and deep learning, they may require years to resolve, delaying the mass adoption of AutoML in image-based road health inspection systems.
In conclusion, although AutoML's utilization is not mainstream at this moment, we believe that the trend of AutoML will continue to grow. This is because there exists a demand for AutoML currently, and in the future, more demand for no-code or low-code machine learning development alternatives will grow together with the expansion of machine learning solutions. Nevertheless, this case study focuses on selected papers whose authors are researchers who can publish their works in academic conferences and journals. In the future, the study should continue to include observing novice users, non-programmer users, and machine learning practitioners in order to discover more insights from non-research perspectives.
自动化机器学习(AutoML)的出现引发了一种有趣的无代码和低代码机器学习趋势,即机器学习流程中的大多数任务无需人类数据科学家的支持就可能实现自动化。虽然将设计复杂网络架构和调整大量超参数的重复性试错任务交给AutoML听起来很合理,但使用AutoML的前沿研究仍然很少。因此,本案例研究的总体目的是调查当前AutoML框架与实际机器学习开发之间的差距。
首先,本文通过2012 - 2021年期间谷歌趋势、IEEE Xplore和ACM数字图书馆中搜索结果数量这一间接指标,证实了AutoML的增长趋势。然后,对三个最流行的AutoML框架(即Auto - Sklearn、AutoKeras和谷歌云AutoML)作为AutoML的代表进行考察;考察包括六个比较方面。基于所研究的三个AutoML框架的可用功能,我们的案例研究继续观察近期关于基于图像的机器学习背景的机器学习研究。这是因为计算机视觉领域涵盖了从基础到高级的多个机器学习层次,并且它最近一直是研究机器学习和人工智能最热门的领域之一。我们的研究特定于基于图像的道路健康检测系统的背景,因为它在计算机视觉领域有着悠久的历史,使我们能够观察从过去到现在的解决方案转变。
在确认了三个搜索引擎中AutoML搜索结果数量的上升之后,我们对三个AutoML代表的研究进一步表明,有许多功能可用于自动化基于图像的道路健康检测系统的开发流程。然而,我们发现近期基于图像的道路健康检测工作在其研究中尚未使用任何形式的AutoML。深入研究这些近期工作,有两个主要问题最能说明为什么大多数研究人员在其基于图像的道路健康检测系统中尚未使用AutoML。首先,与人工指导的决策相比,AutoML的试错决策涉及更多的额外计算。其次,使用AutoML会给模型增加另一层不可解释性。由于这两个问题是现代神经网络和深度学习中的主要痛点,可能需要数年时间才能解决,这延缓了AutoML在基于图像的道路健康检测系统中的广泛应用。
总之,虽然目前AutoML的应用并不主流,但我们相信AutoML的趋势将继续增长。这是因为目前对AutoML存在需求,并且在未来,随着机器学习解决方案的扩展,对无代码或低代码机器学习开发替代方案的需求将进一步增加。然而,本案例研究关注的是其作者能够在学术会议和期刊上发表作品的精选论文。未来,该研究应继续纳入对新手用户、非程序员用户和机器学习从业者的观察,以便从非研究角度发现更多见解。