通过机器学习加速配方设计：生成高通量洗发水配方数据集

Accelerating Formulation Design via Machine Learning: Generating a High-throughput Shampoo Formulations Dataset.

作者信息

Chitre Aniket, Querimit Robert C M, Rihm Simon D, Karan Dogancan, Zhu Benchuan, Wang Ke, Wang Long, Hippalgaonkar Kedar, Lapkin Alexei A

机构信息

Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK.

Cambridge Centre for Advanced Research and Education in Singapore, CARES Ltd. 1 CREATE Way, CREATE Tower #05-05, Singapore, 138602, Singapore.

出版信息

Sci Data. 2024 Jul 3;11(1):728. doi: 10.1038/s41597-024-03573-w.

DOI:10.1038/s41597-024-03573-w

PMID:38961122

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11222379/

Abstract

Liquid formulations are ubiquitous yet have lengthy product development cycles owing to the complex physical interactions between ingredients making it difficult to tune formulations to customer-defined property targets. Interpolative ML models can accelerate liquid formulations design but are typically trained on limited sets of ingredients and without any structural information, which limits their out-of-training predictive capacity. To address this challenge, we selected eighteen formulation ingredients covering a diverse chemical space to prepare an open experimental dataset for training ML models for rinse-off formulations development. The resulting design space has an over 50-fold increase in dimensionality compared to our previous work. Here, we present a dataset of 812 formulations, including 294 stable samples, which cover the entire design space, with phase stability, turbidity, and high-fidelity rheology measurements generated on our semi-automated, ML-driven liquid formulations workflow. Our dataset has the unique attribute of sample-specific uncertainty measurements to train predictive surrogate models.

摘要

液体制剂无处不在，但由于成分之间复杂的物理相互作用，其产品开发周期漫长，难以将制剂调整到客户定义的性能目标。插值机器学习模型可以加速液体制剂设计，但通常是在有限的成分集上进行训练，且没有任何结构信息，这限制了它们在训练之外的预测能力。为应对这一挑战，我们选择了涵盖不同化学空间的18种制剂成分，以制备一个开放的实验数据集，用于训练用于冲洗型制剂开发的机器学习模型。与我们之前的工作相比，由此产生的设计空间维度增加了50多倍。在这里，我们展示了一个包含812种制剂的数据集，其中包括294个稳定样品，这些样品覆盖了整个设计空间，并通过我们的半自动、机器学习驱动的液体制剂工作流程生成了相稳定性、浊度和高保真流变学测量结果。我们的数据集具有样本特定不确定性测量的独特属性，可用于训练预测替代模型。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过机器学习加速配方设计：生成高通量洗发水配方数据集

Accelerating Formulation Design via Machine Learning: Generating a High-throughput Shampoo Formulations Dataset.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

通过机器学习加速配方设计：生成高通量洗发水配方数据集

Accelerating Formulation Design via Machine Learning: Generating a High-throughput Shampoo Formulations Dataset.

作者信息

机构信息

出版信息

相似文献

本文引用的文献