Heydari Sajjad, Raniolo Stefano, Livi Lorenzo, Limongelli Vittorio
Department of Computer Science, University of Manitoba, Winnipeg, MB, R3T 2N2, Canada.
Faculty of Biomedical Sciences, Euler Institute, Università della Svizzera italiana (USI), via G. Buffi 13, CH-6900, Lugano, Switzerland.
Commun Chem. 2023 Jan 13;6(1):13. doi: 10.1038/s42004-022-00790-5.
Predicting structural and energetic properties of a molecular system is one of the fundamental tasks in molecular simulations, and it has applications in chemistry, biology, and medicine. In the past decade, the advent of machine learning algorithms had an impact on molecular simulations for various tasks, including property prediction of atomistic systems. In this paper, we propose a novel methodology for transferring knowledge obtained from simple molecular systems to a more complex one, endowed with a significantly larger number of atoms and degrees of freedom. In particular, we focus on the classification of high and low free-energy conformations. Our approach relies on utilizing (i) a novel hypergraph representation of molecules, encoding all relevant information for characterizing multi-atom interactions for a given conformation, and (ii) novel message passing and pooling layers for processing and making free-energy predictions on such hypergraph-structured data. Despite the complexity of the problem, our results show a remarkable Area Under the Curve of 0.92 for transfer learning from tri-alanine to the deca-alanine system. Moreover, we show that the same transfer learning approach can also be used in an unsupervised way to group chemically related secondary structures of deca-alanine in clusters having similar free-energy values. Our study represents a proof of concept that reliable transfer learning models for molecular systems can be designed, paving the way to unexplored routes in prediction of structural and energetic properties of biologically relevant systems.
预测分子系统的结构和能量性质是分子模拟中的基本任务之一,并且在化学、生物学和医学领域都有应用。在过去十年中,机器学习算法的出现对各种分子模拟任务产生了影响,包括原子系统的性质预测。在本文中,我们提出了一种新颖的方法,用于将从简单分子系统中获得的知识转移到更复杂的系统中,该复杂系统具有大量更多的原子和自由度。特别地,我们专注于高自由能和低自由能构象的分类。我们的方法依赖于利用:(i)一种新颖的分子超图表示,对给定构象中表征多原子相互作用的所有相关信息进行编码;以及(ii)新颖的消息传递和池化层,用于处理此类超图结构数据并进行自由能预测。尽管问题复杂,但我们的结果表明,从三丙氨酸到十丙氨酸系统的迁移学习的曲线下面积显著达到0.92。此外,我们表明相同的迁移学习方法也可以以无监督的方式用于将十丙氨酸的化学相关二级结构分组到具有相似自由能值的簇中。我们的研究代表了一个概念验证,即可以设计出可靠的分子系统迁移学习模型,为预测生物相关系统的结构和能量性质开辟了未探索的途径。