Department of Chemical and Biological Engineering, University of Wisconsin - Madison, 1415 Engineering Drive, Madison, Wisconsin 53706, United States.
J Phys Chem B. 2021 Sep 23;125(37):10610-10620. doi: 10.1021/acs.jpcb.1c05264. Epub 2021 Sep 9.
Surfactants are amphiphilic molecules that are widely used in consumer products, industrial processes, and biological applications. A critical property of a surfactant is the critical micelle concentration (CMC), which is the concentration at which surfactant molecules undergo cooperative self-assembly in solution. Notably, the primary method to obtain CMCs experimentally-tensiometry-is laborious and expensive. In this study, we show that graph convolutional neural networks (GCNs) can predict CMCs directly from the surfactant molecular structure. In particular, we developed a GCN architecture that encodes the surfactant structure in the form of a molecular graph and trained it using experimental CMC data. We found that the GCN can predict CMCs with higher accuracy on a more inclusive data set than previously proposed methods and that it can generalize to anionic, cationic, zwitterionic, and nonionic surfactants using a single model. Molecular saliency maps revealed how atom types and surfactant molecular substructures contribute to CMCs and found this behavior to be in agreement with physical rules that correlate constitutional and topological information to CMCs. Following such rules, we proposed a small set of new surfactants for which experimental CMCs are not available; for these molecules, CMCs predicted with our GCN exhibited similar trends to those obtained from molecular simulations. These results provide evidence that GCNs can enable high-throughput screening of surfactants with desired self-assembly characteristics.
表面活性剂是一种具有两亲性的分子,广泛应用于消费品、工业过程和生物应用中。表面活性剂的一个关键性质是临界胶束浓度(CMC),这是表面活性剂分子在溶液中发生协同自组装的浓度。值得注意的是,获得 CMC 的主要实验方法——张力计法——既费力又昂贵。在这项研究中,我们表明,图卷积神经网络(GCN)可以直接从表面活性剂分子结构预测 CMC。具体来说,我们开发了一种 GCN 架构,该架构以分子图的形式对表面活性剂结构进行编码,并使用实验 CMC 数据对其进行训练。我们发现,与之前提出的方法相比,GCN 可以在更全面的数据集上以更高的精度预测 CMC,并且可以使用单个模型推广到阴离子、阳离子、两性离子和非离子表面活性剂。分子显着性图揭示了原子类型和表面活性剂分子亚结构如何对 CMC 做出贡献,并发现这种行为与将构象和拓扑信息与 CMC 相关联的物理规则一致。根据这些规则,我们提出了一组新的表面活性剂,这些表面活性剂的实验 CMC 不可用;对于这些分子,我们的 GCN 预测的 CMC 表现出与从分子模拟中获得的相似趋势。这些结果表明,GCN 可以实现具有所需自组装特性的表面活性剂的高通量筛选。