• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于自动机器学习的实验设计(DOE)选择工作流程以及使用仿真模型的基准数据采集策略。

AutoML based workflow for design of experiments (DOE) selection and benchmarking data acquisition strategies with simulation models.

作者信息

Xu Xukuan, Li Donghui, Bi Jinghou, Moeckel Michael

机构信息

Aschaffenburg University of Applied Sciences, Faculty of Engineering, Aschaffenburg, 63743, Germany.

Dresden University of Technology DE, Faculty of Engineering, Dresden, 01069, Germany.

出版信息

Sci Rep. 2024 Dec 31;14(1):32170. doi: 10.1038/s41598-024-83581-3.

DOI:10.1038/s41598-024-83581-3
PMID:39741203
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11688508/
Abstract

Design of experiments (DOE) is an established method to allocate resources for efficient parameter space exploration. Model based active learning (AL) data sampling strategies have shown potential for further optimization. This paper introduces a workflow for conducting DOE comparative studies using automated machine learning. Based on a practical definition of model complexity in the context of machine learning, the interplay of systematic data generation and model performance is examined considering various sources of uncertainty: this includes uncertainties caused by stochastic sampling strategies, imprecise data, suboptimal modeling, and model evaluation. Results obtained from electrical circuit models with varying complexity show that not all AL sampling strategies outperform conventional DOE strategies, depending on the available data volume, the complexity of the dataset, and data uncertainties. Trade-offs in resource allocation strategies, in particular between identical replication of data points for statistical noise reduction and broad sampling for maximum parameter space exploration, and their impact on subsequent machine learning analysis are systematically investigated. Results indicate that replication oriented strategies should not be dismissed but may prove advantageous for cases with non-negligible noise impact and intermediate resource availability. The provided workflow can be used to simulate practical experimental conditions for DOE testing and DOE selection.

摘要

实验设计(DOE)是一种既定的方法,用于分配资源以高效地探索参数空间。基于模型的主动学习(AL)数据采样策略已显示出进一步优化的潜力。本文介绍了一种使用自动化机器学习进行DOE比较研究的工作流程。基于机器学习背景下模型复杂性的实际定义,考虑各种不确定性来源,研究了系统数据生成与模型性能之间的相互作用:这包括由随机采样策略、不精确数据、次优建模和模型评估引起的不确定性。从具有不同复杂性的电路模型获得的结果表明,并非所有的AL采样策略都优于传统的DOE策略,这取决于可用数据量、数据集的复杂性和数据不确定性。系统地研究了资源分配策略中的权衡,特别是在为降低统计噪声而对数据点进行相同复制与为最大程度探索参数空间而进行广泛采样之间的权衡,以及它们对后续机器学习分析的影响。结果表明,不应摒弃面向复制的策略,但对于噪声影响不可忽略且资源可用性中等的情况,该策略可能证明是有利的。所提供的工作流程可用于模拟DOE测试和DOE选择的实际实验条件。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/326f/11688508/95d25e85664e/41598_2024_83581_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/326f/11688508/782403c5bb70/41598_2024_83581_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/326f/11688508/0191e50967da/41598_2024_83581_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/326f/11688508/3341f6eae176/41598_2024_83581_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/326f/11688508/37e2c19d7c8f/41598_2024_83581_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/326f/11688508/f28019d77bd1/41598_2024_83581_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/326f/11688508/80d51849e58a/41598_2024_83581_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/326f/11688508/95d25e85664e/41598_2024_83581_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/326f/11688508/782403c5bb70/41598_2024_83581_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/326f/11688508/0191e50967da/41598_2024_83581_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/326f/11688508/3341f6eae176/41598_2024_83581_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/326f/11688508/37e2c19d7c8f/41598_2024_83581_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/326f/11688508/f28019d77bd1/41598_2024_83581_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/326f/11688508/80d51849e58a/41598_2024_83581_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/326f/11688508/95d25e85664e/41598_2024_83581_Fig7_HTML.jpg

相似文献

1
AutoML based workflow for design of experiments (DOE) selection and benchmarking data acquisition strategies with simulation models.基于自动机器学习的实验设计(DOE)选择工作流程以及使用仿真模型的基准数据采集策略。
Sci Rep. 2024 Dec 31;14(1):32170. doi: 10.1038/s41598-024-83581-3.
2
Democratizing Artificial Intelligence Imaging Analysis With Automated Machine Learning: Tutorial.人工智能成像分析的民主化:自动化机器学习教程。
J Med Internet Res. 2023 Oct 12;25:e49949. doi: 10.2196/49949.
3
Optimizing drug delivery systems using systematic "design of experiments." Part I: fundamental aspects.使用系统的“实验设计”优化药物递送系统。第一部分:基本方面。
Crit Rev Ther Drug Carrier Syst. 2005;22(1):27-105. doi: 10.1615/critrevtherdrugcarriersyst.v22.i1.20.
4
Hierarchical automated machine learning (AutoML) for advanced unconventional reservoir characterization.用于高级非常规储层表征的分层自动机器学习(AutoML)。
Sci Rep. 2023 Aug 24;13(1):13812. doi: 10.1038/s41598-023-40904-0.
5
Addressing Imbalanced Classification Problems in Drug Discovery and Development Using Random Forest, Support Vector Machine, AutoGluon-Tabular, and H2O AutoML.使用随机森林、支持向量机、AutoGluon-Tabular和H2O自动机器学习解决药物发现与开发中的不平衡分类问题。
J Chem Inf Model. 2025 Apr 28;65(8):3976-3989. doi: 10.1021/acs.jcim.5c00023. Epub 2025 Apr 15.
6
A Rational Approach to Predicting Immediate Release Formulation Behavior in Multiple Gastric Motility Patterns: A Combination of a Biorelevant Apparatus, Design of Experiments, and Machine Learning.预测多种胃动力模式下速释制剂行为的合理方法:生物相关性装置、实验设计和机器学习的结合
Pharmaceutics. 2023 Jul 31;15(8):2056. doi: 10.3390/pharmaceutics15082056.
7
Optimising the production of PLGA nanoparticles by combining design of experiment and machine learning.通过结合实验设计和机器学习来优化 PLGA 纳米粒的生产。
Int J Pharm. 2024 Dec 25;667(Pt A):124905. doi: 10.1016/j.ijpharm.2024.124905. Epub 2024 Nov 2.
8
DOE JGI Metagenome Workflow.美国能源部联合基因组研究所宏基因组工作流程
mSystems. 2021 May 18;6(3):e00804-20. doi: 10.1128/mSystems.00804-20.
9
Active learning using deep Bayesian networks for surgical workflow analysis.基于深度贝叶斯网络的主动学习在手术流程分析中的应用。
Int J Comput Assist Radiol Surg. 2019 Jun;14(6):1079-1087. doi: 10.1007/s11548-019-01963-9. Epub 2019 Apr 9.
10
Human behavior in image-based Road Health Inspection Systems despite the emerging AutoML.尽管出现了自动化机器学习,但基于图像的道路健康检测系统中的人类行为。
J Big Data. 2022;9(1):96. doi: 10.1186/s40537-022-00646-8. Epub 2022 Jul 20.

引用本文的文献

1
Predicting soil compaction parameters in expansive soils using advanced machine learning models: a comparative study.使用先进机器学习模型预测膨胀土的土壤压实参数:一项比较研究。
Sci Rep. 2025 Jul 5;15(1):24018. doi: 10.1038/s41598-025-09279-2.
2
Improved YOLOv8n-based bridge crack detection algorithm under complex background conditions.复杂背景条件下基于改进YOLOv8n的桥梁裂缝检测算法
Sci Rep. 2025 Apr 16;15(1):13074. doi: 10.1038/s41598-025-97842-2.

本文引用的文献

1
What is replication?复制是什么?
PLoS Biol. 2020 Mar 27;18(3):e3000691. doi: 10.1371/journal.pbio.3000691. eCollection 2020 Mar.
2
Pool-Based Sequential Active Learning for Regression.基于池的回归序贯主动学习
IEEE Trans Neural Netw Learn Syst. 2019 May;30(5):1348-1359. doi: 10.1109/TNNLS.2018.2868649. Epub 2018 Sep 27.
3
Batch Mode Active Learning for Regression With Expected Model Change.批量模式下基于预期模型变化的回归主动学习。
IEEE Trans Neural Netw Learn Syst. 2017 Jul;28(7):1668-1681. doi: 10.1109/TNNLS.2016.2542184. Epub 2016 Apr 18.
4
Sample size planning for classification models.分类模型的样本量规划。
Anal Chim Acta. 2013 Jan 14;760:25-33. doi: 10.1016/j.aca.2012.11.007. Epub 2012 Nov 17.