Suppr超能文献

基于有机溶剂 SMILES 编码的机器学习预测经验极性。

Machine learning prediction of empirical polarity using SMILES encoding of organic solvents.

机构信息

Department of Chemistry & Centre for Advanced Studies in Chemistry, Panjab University, Chandigarh, 160014, India.

出版信息

Mol Divers. 2023 Oct;27(5):2331-2343. doi: 10.1007/s11030-022-10559-6. Epub 2022 Nov 5.

Abstract

Machine learning based statistical models have played a significant role in increasing the speed and accuracy with which the chemical and physical properties of chemical compounds can be predicted as compared to the experimental, and traditional ab initio and quantum mechanical approaches. The transformative impact that these techniques have, in the field of chemical sciences has completely changed the way experiments are designed. The last decade has seen the prominence of computer-aided molecular design based on machine learning algorithms. The major challenge has been the generation of machine-readable data in the form of descriptors and observations for training the model, which can again be time-consuming and computationally expensive if atomic coordinates based molecular encoding approach is used. In this study, we have tried to solve this problem using SMILES representation of molecules for generating various topological, physicochemical, electronic and steric descriptors using open-source cheminformatics packages. With the aid of the data generated using these packages, we have been able to develop a simple and explainable quantitative structure property relationship model using artificial neural network based on 7 numerical descriptors and 1 categorical descriptor for predicting the empirical polarity of a wide diversity of organic solvents. Since polarity is the representation of various solute-solvent and solvent-solvent interactions taking place in an organic transformation, its intuition beforehand will definitely help a chemist in a better experimental design. An ANN algorithm based on 8 descriptors was successfully employed to predict the E(30) values of organic solvents.

摘要

基于机器学习的统计模型在提高化合物化学和物理性质预测的速度和准确性方面发挥了重要作用,与实验以及传统的从头算和量子力学方法相比。这些技术在化学科学领域产生的变革性影响彻底改变了实验设计的方式。过去十年见证了基于机器学习算法的计算机辅助分子设计的兴起。主要的挑战是生成可用于训练模型的机器可读数据,以描述符和观察值的形式呈现,如果使用基于原子坐标的分子编码方法,这可能会既耗时又昂贵。在这项研究中,我们试图使用分子的 SMILES 表示来解决这个问题,以便使用开源化学信息学软件包生成各种拓扑、物理化学、电子和立体描述符。借助这些软件包生成的数据,我们已经能够使用基于 7 个数值描述符和 1 个类别描述符的人工神经网络开发一个简单且可解释的定量构效关系模型,用于预测广泛多样的有机溶剂的经验极性。由于极性是有机转化中发生的各种溶质-溶剂和溶剂-溶剂相互作用的表示,因此事先了解极性肯定会帮助化学家进行更好的实验设计。成功地使用基于 8 个描述符的 ANN 算法来预测有机溶剂的 E(30) 值。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验