Ayres Lucas B, Gomez Federico J V, Silva Maria Fernanda, Linton Jeb R, Garcia Carlos D
Department of Chemistry, Clemson University, 211 S. Palmetto Blvd, Clemson, SC, 29634, USA.
Facultad de Ciencias Agrarias, Instituto de Biología Agrícola de Mendoza (IBAM-CONICET), Universidad Nacional de Cuyo, Mendoza, Argentina.
Sci Rep. 2024 Feb 22;14(1):2715. doi: 10.1038/s41598-022-27106-w.
The application of natural deep eutectic solvents (NADES) in the pharmaceutical, agricultural, and food industries represents one of the fastest growing fields of green chemistry, as these mixtures can potentially replace traditional organic solvents. These advances are, however, limited by the development of new NADES which is today, almost exclusively empirically driven and often derivative from known mixtures. To overcome this limitation, we propose the use of a transformer-based machine learning approach. Here, the transformer-based neural network model was first pre-trained to recognize chemical patterns from SMILES representations (unlabeled general chemical data) and then fine-tuned to recognize the patterns in strings that lead to the formation of either stable NADES or simple mixtures of compounds not leading to the formation of stable NADES (binary classification). Because this strategy was adapted from language learning, it allows the use of relatively small datasets and relatively low computational resources. The resulting algorithm is capable of predicting the formation of multiple new stable eutectic mixtures (n = 337) from a general database of natural compounds. More importantly, the system is also able to predict the components and molar ratios needed to render NADES with new molecules (not present in the training database), an aspect that was validated using previously reported NADES as well as by developing multiple novel solvents containing ibuprofen. We believe this strategy has the potential to transform the screening process for NADES as well as the pharmaceutical industry, streamlining the use of bioactive compounds as functional components of liquid formulations, rather than simple solutes.
天然深共晶溶剂(NADES)在制药、农业和食品工业中的应用是绿色化学中发展最快的领域之一,因为这些混合物有可能取代传统有机溶剂。然而,这些进展受到新NADES开发的限制,目前新NADES几乎完全是由经验驱动的,并且通常是从已知混合物衍生而来。为了克服这一限制,我们提出使用基于Transformer的机器学习方法。在这里,基于Transformer的神经网络模型首先经过预训练,以从SMILES表示(未标记的一般化学数据)中识别化学模式,然后进行微调,以识别字符串中导致形成稳定NADES或不导致形成稳定NADES的化合物简单混合物的模式(二元分类)。由于该策略改编自语言学习,因此可以使用相对较小的数据集和相对较低的计算资源。所得算法能够从天然化合物的通用数据库中预测多种新的稳定共晶混合物(n = 337)的形成。更重要的是,该系统还能够预测用新分子(训练数据库中不存在)制备NADES所需的成分和摩尔比,这一方面已通过先前报道的NADES以及开发多种含布洛芬的新型溶剂得到验证。我们相信,这种策略有可能改变NADES的筛选过程以及制药行业,简化生物活性化合物作为液体制剂功能成分而非简单溶质的使用。