Department of Physics and Atmospheric Science, Dalhousie University, Halifax, Nova Scotia B3H4R2, Canada.
Department of Chemistry, Dalhousie University, Halifax, Nova Scotia B3H 4R2, Canada.
ACS Appl Mater Interfaces. 2021 Mar 10;13(9):11449-11460. doi: 10.1021/acsami.0c21036. Epub 2021 Mar 1.
The most direct approach to determining if two aqueous solutions will phase-separate upon mixing is to exhaustively screen them in a pair-wise fashion. This is a time-consuming process that involves preparation of numerous stock solutions, precise transfer of highly concentrated and often viscous solutions, exhaustive agitation to ensure thorough mixing, and time-sensitive monitoring to observe the presence of emulsion characteristics indicative of phase separation. Here, we examined the pair-wise mixing behavior of 68 water-soluble compounds by observing the formation of microscopic phase boundaries and droplets of 2278 unique 2-component solutions. A series of machine learning classifiers (artificial neural network, random forest, k-nearest neighbors, and support vector classifier) were then trained on physicochemical property data associated with the 68 compounds and used to predict their miscibility upon mixing. Miscibility predictions were then compared to the experimental observations. The random forest classifier was the most successful classifier of those tested, displaying an average receiver operator characteristic area under the curve of 0.74. The random forest classifier was validated by removing either one or two compounds from the input data, training the classifier on the remaining data and then predicting the miscibility of solutions involving the removed compound(s) using the classifier. The accuracy, specificity, and sensitivity of the random forest classifier were 0.74, 0.80, and 0.51, respectively, when one of the two compounds to be examined was not represented in the training data. When asked to predict the miscibility of two compounds, neither of which were represented in the training data, the accuracy, specificity, and sensitivity values for the random forest classifier were 0.70, 0.82 and 0.29, respectively. Thus, there is potential for this machine learning approach to improve the design of screening experiments to accelerate the discovery of aqueous two-phase systems for numerous scientific and industrial applications.
确定两种水溶液在混合后是否会分相的最直接方法是通过逐一配对的方式对它们进行彻底筛选。这是一个耗时的过程,涉及到制备大量储备溶液、精确转移高浓度且往往粘稠的溶液、充分搅拌以确保彻底混合,以及对乳液特征的时间敏感监测,这些特征表明存在分相。在这里,我们通过观察 2278 种独特的 2 组分溶液中微观相边界和液滴的形成,研究了 68 种水溶性化合物的两两混合行为。然后,我们使用与 68 种化合物相关的物理化学性质数据训练了一系列机器学习分类器(人工神经网络、随机森林、k-最近邻和支持向量分类器),并将其用于预测它们混合时的混溶性。然后将混溶性预测与实验观察结果进行比较。随机森林分类器是测试中最成功的分类器,其平均接收者操作特征曲线下面积为 0.74。随机森林分类器通过从输入数据中删除一个或两个化合物进行验证,然后使用剩余数据训练分类器,并使用该分类器预测涉及删除的化合物的溶液的混溶性。随机森林分类器的准确性、特异性和敏感性分别为 0.74、0.80 和 0.51,当要检查的两种化合物之一不在训练数据中时。当要求预测两种化合物的混溶性时,两种化合物都不在训练数据中,随机森林分类器的准确性、特异性和敏感性值分别为 0.70、0.82 和 0.29。因此,这种机器学习方法有可能改进筛选实验的设计,从而加速发现用于众多科学和工业应用的水相双相系统。