Zhang Zaixi, Liu Qi, Lee Chee-Kong, Hsieh Chang-Yu, Chen Enhong
Anhui Province Key Lab of Big Data Analysis and Application, University of Science and Technology of China Hefei Anhui 230026 China
State Key Laboratory of Cognitive Intelligence Hefei Anhui 230088 China.
Chem Sci. 2023 Jul 19;14(31):8380-8392. doi: 10.1039/d3sc02538a. eCollection 2023 Aug 9.
Designing molecules with desirable physiochemical properties and functionalities is a long-standing challenge in chemistry, material science, and drug discovery. Recently, machine learning-based generative models have emerged as promising approaches for molecule design. However, further refinement of methodology is highly desired as most existing methods lack unified modeling of 2D topology and 3D geometry information and fail to effectively learn the structure-property relationship for molecule design. Here we present MolCode, a roto-translation equivariant generative framework for molecular graph-structure Co-design. In MolCode, 3D geometric information empowers the molecular 2D graph generation, which in turn helps guide the prediction of molecular 3D structure. Extensive experimental results show that MolCode outperforms previous methods on a series of challenging tasks including molecule design, targeted molecule discovery, and structure-based drug design. Particularly, MolCode not only consistently generates valid (99.95% validity) and diverse (98.75% uniqueness) molecular graphs/structures with desirable properties, but also generates drug-like molecules with high affinity to target proteins (61.8% high affinity ratio), which demonstrates MolCode's potential applications in material design and drug discovery. Our extensive investigation reveals that the 2D topology and 3D geometry contain intrinsically complementary information in molecule design, and provide new insights into machine learning-based molecule representation and generation.
设计具有理想物理化学性质和功能的分子,是化学、材料科学和药物研发领域长期面临的挑战。近年来,基于机器学习的生成模型已成为分子设计的一种很有前景的方法。然而,由于大多数现有方法缺乏对二维拓扑结构和三维几何信息的统一建模,无法有效地学习分子设计的结构-性质关系,因此迫切需要进一步完善方法。在此,我们提出了MolCode,一种用于分子图结构协同设计的旋转平移等变生成框架。在MolCode中,三维几何信息为分子二维图的生成提供支持,而这反过来又有助于指导分子三维结构的预测。大量实验结果表明,在包括分子设计、靶向分子发现和基于结构的药物设计等一系列具有挑战性的任务中,MolCode优于先前的方法。特别是,MolCode不仅始终能生成具有理想性质的有效分子图/结构(有效性99.95%)和多样化分子图/结构(独特性98.75%),还能生成与靶蛋白具有高亲和力的类药物分子(高亲和力比例61.8%),这证明了MolCode在材料设计和药物发现中的潜在应用价值。我们广泛的研究表明,二维拓扑结构和三维几何结构在分子设计中包含内在互补信息,并为基于机器学习的分子表示和生成提供了新的见解。