Li Tonglei, Huls Nicholas J, Lu Shan, Hou Peng
Deparment of Industrial and Molecular Pharmaceutics, Purdue University, West Lafayette, 47907, IN, USA.
Commun Chem. 2024 Jun 11;7(1):133. doi: 10.1038/s42004-024-01217-z.
Molecular representation is critical in chemical machine learning. It governs the complexity of model development and the fulfillment of training data to avoid either over- or under-fitting. As electronic structures and associated attributes are the root cause for molecular interactions and their manifested properties, we have sought to examine the local electron information on a molecular manifold to understand and predict molecular interactions. Our efforts led to the development of a lower-dimensional representation of a molecular manifold, Manifold Embedding of Molecular Surface (MEMS), to embody surface electronic quantities. By treating a molecular surface as a manifold and computing its embeddings, the embedded electronic attributes retain the chemical intuition of molecular interactions. MEMS can be further featurized as input for chemical learning. Our solubility prediction with MEMS demonstrated the feasibility of both shallow and deep learning by neural networks, suggesting that MEMS is expressive and robust against dimensionality reduction.
分子表示在化学机器学习中至关重要。它决定了模型开发的复杂性以及训练数据的完备性,以避免过拟合或欠拟合。由于电子结构及其相关属性是分子相互作用及其表现出的性质的根本原因,我们试图研究分子流形上的局部电子信息,以理解和预测分子相互作用。我们的努力促成了一种分子流形的低维表示——分子表面流形嵌入(MEMS)的发展,以体现表面电子量。通过将分子表面视为一个流形并计算其嵌入,嵌入的电子属性保留了分子相互作用的化学直观。MEMS可以进一步特征化作为化学学习的输入。我们用MEMS进行的溶解度预测证明了神经网络进行浅层和深度学习的可行性,表明MEMS在降维方面具有表现力和鲁棒性。