Suppr超能文献

可解释机器学习模型在聚合诱导自组装中的相预测。

Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly.

机构信息

Department for Data Science and AI, Monash University, Wellington Road, Clayton, VIC 3168, Australia.

CSIRO, Manufacturing Business Unit, Research Way, Clayton, VIC 3168, Australia.

出版信息

J Chem Inf Model. 2023 Jun 12;63(11):3288-3306. doi: 10.1021/acs.jcim.3c00460. Epub 2023 May 19.

Abstract

While polymerization-induced self-assembly (PISA) has become a preferred synthetic route toward amphiphilic block copolymer self-assemblies, predicting their phase behavior from experimental design is extremely challenging, requiring time and work-intensive creation of empirical phase diagrams whenever self-assemblies of novel monomer pairs are sought for specific applications. To alleviate this burden, we develop here the first framework for a data-driven methodology for the probabilistic modeling of PISA morphologies based on a selection and suitable adaption of statistical machine learning methods. As the complexity of PISA precludes generating large volumes of training data with simulations, we focus on interpretable low variance methods that can be interrogated for conformity with chemical intuition and that promise to work well with only 592 training data points which we curated from the PISA literature. We found that among the evaluated linear models, generalized additive models, and rule and tree ensembles, all but the linear models show a decent interpolation performance with around 0.2 estimated error rate and 1 bit expected cross entropy loss (surprisal) when predicting the mixture of morphologies formed from monomer pairs already encountered in the training data. When considering extrapolation to new monomer combinations, the model performance is weaker but the best model (random forest) still achieves highly nontrivial prediction performance (0.27 error rate, 1.6 bit surprisal), which renders it a good candidate to support the creation of empirical phase diagrams for new monomers and conditions. Indeed, we find in three case studies that, when used to actively learn phase diagrams, the model is able to select a smart set of experiments that lead to satisfactory phase diagrams after observing only relatively few data points (5-16) for the targeted conditions. The data set as well as all model training and evaluation codes are publicly available through the GitHub repository of the last author.

摘要

虽然聚合诱导自组装(PISA)已成为制备两亲嵌段共聚物自组装体的首选合成途径,但从实验设计中预测其相行为极具挑战性,每当需要针对特定应用寻找新型单体对的自组装体时,都需要耗费时间和精力来创建经验相图。为了减轻这种负担,我们在这里开发了第一个基于机器学习方法的 PISA 形态概率建模数据驱动方法框架。由于 PISA 的复杂性使得无法通过模拟生成大量的训练数据,因此我们专注于可解释的低方差方法,这些方法可以通过化学直觉进行检查,并有望仅使用 592 个从 PISA 文献中精选的训练数据点就能够很好地工作。我们发现,在所评估的线性模型、广义加性模型、规则和树集成模型中,除线性模型外,所有模型在预测已在训练数据中遇到的单体对形成的混合物形态时,都表现出相当好的内插性能,估计误差率约为 0.2,期望交叉熵损失(惊讶度)为 1 位。当考虑外推到新的单体组合时,模型性能较弱,但最佳模型(随机森林)仍能实现非常高的非平凡预测性能(0.27 误差率,1.6 位惊讶度),这使其成为支持为新单体和条件创建经验相图的良好候选者。实际上,我们在三个案例研究中发现,当用于主动学习相图时,该模型能够选择一组智能实验,在仅观察相对较少的数据点(5-16)后,就能得到令人满意的相图。该数据集以及所有模型训练和评估代码都可以通过最后一位作者的 GitHub 存储库公开获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/05be/10268968/8c30df7ac1a4/ci3c00460_0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验