Suppr超能文献

通过机器学习加速配方设计:生成高通量洗发水配方数据集

Accelerating Formulation Design via Machine Learning: Generating a High-throughput Shampoo Formulations Dataset.

作者信息

Chitre Aniket, Querimit Robert C M, Rihm Simon D, Karan Dogancan, Zhu Benchuan, Wang Ke, Wang Long, Hippalgaonkar Kedar, Lapkin Alexei A

机构信息

Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK.

Cambridge Centre for Advanced Research and Education in Singapore, CARES Ltd. 1 CREATE Way, CREATE Tower #05-05, Singapore, 138602, Singapore.

出版信息

Sci Data. 2024 Jul 3;11(1):728. doi: 10.1038/s41597-024-03573-w.

Abstract

Liquid formulations are ubiquitous yet have lengthy product development cycles owing to the complex physical interactions between ingredients making it difficult to tune formulations to customer-defined property targets. Interpolative ML models can accelerate liquid formulations design but are typically trained on limited sets of ingredients and without any structural information, which limits their out-of-training predictive capacity. To address this challenge, we selected eighteen formulation ingredients covering a diverse chemical space to prepare an open experimental dataset for training ML models for rinse-off formulations development. The resulting design space has an over 50-fold increase in dimensionality compared to our previous work. Here, we present a dataset of 812 formulations, including 294 stable samples, which cover the entire design space, with phase stability, turbidity, and high-fidelity rheology measurements generated on our semi-automated, ML-driven liquid formulations workflow. Our dataset has the unique attribute of sample-specific uncertainty measurements to train predictive surrogate models.

摘要

液体制剂无处不在,但由于成分之间复杂的物理相互作用,其产品开发周期漫长,难以将制剂调整到客户定义的性能目标。插值机器学习模型可以加速液体制剂设计,但通常是在有限的成分集上进行训练,且没有任何结构信息,这限制了它们在训练之外的预测能力。为应对这一挑战,我们选择了涵盖不同化学空间的18种制剂成分,以制备一个开放的实验数据集,用于训练用于冲洗型制剂开发的机器学习模型。与我们之前的工作相比,由此产生的设计空间维度增加了50多倍。在这里,我们展示了一个包含812种制剂的数据集,其中包括294个稳定样品,这些样品覆盖了整个设计空间,并通过我们的半自动、机器学习驱动的液体制剂工作流程生成了相稳定性、浊度和高保真流变学测量结果。我们的数据集具有样本特定不确定性测量的独特属性,可用于训练预测替代模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4098/11222379/42fbffa883c2/41597_2024_3573_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验