基于几何增强的分子表示学习及其在性质预测中的应用

Geometry-Augmented Molecular Representation Learning for Property Prediction.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2024 Sep-Oct;21(5):1518-1528. doi: 10.1109/TCBB.2024.3402337. Epub 2024 Oct 9.

DOI:10.1109/TCBB.2024.3402337

Abstract

Accurate molecular representation plays a crucial role in expediting the process of drug discovery. Graph neural networks (GNNs) have demonstrated robust capabilities in molecular representation learning, adept at capturing structural and spatial information in molecular graphs. For molecular representation learning, most previous GNN methods are specialized in dealing with 2D or 3D molecular data formats. By further fusing the geometric attributes and structural features of molecules, we can elevate the performance of molecular representation. To realize this, we present a novel geometry-augmented molecular representation learning model, designed to effectively encode both the 2D structural and 3D spatial information inherent in molecular graphs. By incorporating structural and spatial information as attention biases in the graph Transformer framework, our model offers a comprehensive architecture that introduces molecular structural details at both atom and bond levels. We further propose a geometry information fusion module to encode the geometry information within 3D molecular graphs. The experimental results show the efficacy of our model, demonstrating its ability to achieve competitive performance when compared to state-of-the-art (SOTA) models in various property prediction tasks.

摘要

准确的分子表示在加速药物发现过程中起着至关重要的作用。图神经网络（GNN）在分子表示学习方面表现出了强大的能力，能够捕捉分子图中的结构和空间信息。对于分子表示学习，大多数以前的 GNN 方法专门用于处理 2D 或 3D 分子数据格式。通过进一步融合分子的几何属性和结构特征，我们可以提高分子表示的性能。为此，我们提出了一种新颖的几何增强分子表示学习模型，旨在有效地编码分子图中固有的 2D 结构和 3D 空间信息。通过在图 Transformer 框架中将结构和空间信息作为注意力偏差，我们的模型提供了一种全面的架构，在原子和键级别引入了分子结构细节。我们进一步提出了一个几何信息融合模块，用于编码 3D 分子图中的几何信息。实验结果表明了我们模型的有效性，证明了它在各种属性预测任务中与最先进的（SOTA）模型相比具有竞争力的性能。