Xu Xukuan, Li Donghui, Bi Jinghou, Moeckel Michael
Aschaffenburg University of Applied Sciences, Faculty of Engineering, Aschaffenburg, 63743, Germany.
Dresden University of Technology DE, Faculty of Engineering, Dresden, 01069, Germany.
Sci Rep. 2024 Dec 31;14(1):32170. doi: 10.1038/s41598-024-83581-3.
Design of experiments (DOE) is an established method to allocate resources for efficient parameter space exploration. Model based active learning (AL) data sampling strategies have shown potential for further optimization. This paper introduces a workflow for conducting DOE comparative studies using automated machine learning. Based on a practical definition of model complexity in the context of machine learning, the interplay of systematic data generation and model performance is examined considering various sources of uncertainty: this includes uncertainties caused by stochastic sampling strategies, imprecise data, suboptimal modeling, and model evaluation. Results obtained from electrical circuit models with varying complexity show that not all AL sampling strategies outperform conventional DOE strategies, depending on the available data volume, the complexity of the dataset, and data uncertainties. Trade-offs in resource allocation strategies, in particular between identical replication of data points for statistical noise reduction and broad sampling for maximum parameter space exploration, and their impact on subsequent machine learning analysis are systematically investigated. Results indicate that replication oriented strategies should not be dismissed but may prove advantageous for cases with non-negligible noise impact and intermediate resource availability. The provided workflow can be used to simulate practical experimental conditions for DOE testing and DOE selection.
实验设计(DOE)是一种既定的方法,用于分配资源以高效地探索参数空间。基于模型的主动学习(AL)数据采样策略已显示出进一步优化的潜力。本文介绍了一种使用自动化机器学习进行DOE比较研究的工作流程。基于机器学习背景下模型复杂性的实际定义,考虑各种不确定性来源,研究了系统数据生成与模型性能之间的相互作用:这包括由随机采样策略、不精确数据、次优建模和模型评估引起的不确定性。从具有不同复杂性的电路模型获得的结果表明,并非所有的AL采样策略都优于传统的DOE策略,这取决于可用数据量、数据集的复杂性和数据不确定性。系统地研究了资源分配策略中的权衡,特别是在为降低统计噪声而对数据点进行相同复制与为最大程度探索参数空间而进行广泛采样之间的权衡,以及它们对后续机器学习分析的影响。结果表明,不应摒弃面向复制的策略,但对于噪声影响不可忽略且资源可用性中等的情况,该策略可能证明是有利的。所提供的工作流程可用于模拟DOE测试和DOE选择的实际实验条件。