Lingang Laboratory, Shanghai 200031, China.
Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae340.
Accurate prediction of molecular properties is fundamental in drug discovery and development, providing crucial guidance for effective drug design. A critical factor in achieving accurate molecular property prediction lies in the appropriate representation of molecular structures. Presently, prevalent deep learning-based molecular representations rely on 2D structure information as the primary molecular representation, often overlooking essential three-dimensional (3D) conformational information due to the inherent limitations of 2D structures in conveying atomic spatial relationships. In this study, we propose employing the Gram matrix as a condensed representation of 3D molecular structures and for efficient pretraining objectives. Subsequently, we leverage this matrix to construct a novel molecular representation model, Pre-GTM, which inherently encapsulates 3D information. The model accurately predicts the 3D structure of a molecule by estimating the Gram matrix. Our findings demonstrate that Pre-GTM model outperforms the baseline Graphormer model and other pretrained models in the QM9 and MoleculeNet quantitative property prediction task. The integration of the Gram matrix as a condensed representation of 3D molecular structure, incorporated into the Pre-GTM model, opens up promising avenues for its potential application across various domains of molecular research, including drug design, materials science, and chemical engineering.
准确预测分子性质是药物发现和开发的基础,为有效药物设计提供了关键指导。实现准确分子性质预测的关键因素在于对分子结构的适当表示。目前,基于深度学习的流行分子表示方法依赖于 2D 结构信息作为主要的分子表示,由于 2D 结构在传达原子空间关系方面的固有局限性,往往忽略了重要的三维(3D)构象信息。在这项研究中,我们提出使用 Gram 矩阵作为 3D 分子结构的紧凑表示形式,并作为有效的预训练目标。随后,我们利用这个矩阵构建了一个新的分子表示模型 Pre-GTM,它内在地包含了 3D 信息。该模型通过估计 Gram 矩阵来准确预测分子的 3D 结构。我们的研究结果表明,Pre-GTM 模型在 QM9 和 MoleculeNet 定量性质预测任务中优于基线 Graphormer 模型和其他预训练模型。将 Gram 矩阵作为 3D 分子结构的紧凑表示形式集成到 Pre-GTM 模型中,为其在药物设计、材料科学和化学工程等分子研究的各个领域的潜在应用开辟了广阔的道路。