Shi Zhiwei, Ma Miao, Ning Hanyang, Yang Bo, Dang Jingshuang
School of Computer Science, Shaanxi Normal University, Xi'an, 710119, People's Republic of China.
Institute of New Concept Sensors and Molecular Materials, Shaanxi Normal University, Xi'an, 710119, People's Republic of China.
Mol Divers. 2025 Jan 25. doi: 10.1007/s11030-024-11100-7.
Molecular Property Prediction (MPP) is a fundamental task in important research fields such as chemistry, materials, biology, and medicine, where traditional computational chemistry methods based on quantum mechanics often consume substantial time and computing power. In recent years, machine learning has been increasingly used in computational chemistry, in which graph neural networks have shown good performance in molecular property prediction tasks, but they have some limitations in terms of generalizability, interpretability, and certainty. In order to address the above challenges, a Multiscale Molecular Structural Neural Network (MMSNet) is proposed in this paper, which obtains rich multiscale molecular representations through the information fusion between bonded and non-bonded "message passing" structures at the atomic scale and spatial feature information "encoder-decoder" structures at the molecular scale; a multi-level attention mechanism is introduced on the basis of theoretical analysis of molecular mechanics in order to enhance the model's interpretability; the prediction results of MMSNet are used as label values and clustered in the molecular library by the K-NN (K-Nearest Neighbors) algorithm to reverse match the spatial structure of the molecules, and the certainty of the model is quantified by comparing virtual screening results across different K-values. Experimental results in the authoritative small molecule dataset QM9 and the macromolecular protein database PDBbind demonstrate that MMSNet has optimal prediction accuracy, model complexity, and generalizability compared with more than ten existing state-of-the-art (SOTA) models in a variety of different types of prediction tasks; it has a great potential for downstream tasks such as chemical research, drug discovery, and material design.
分子性质预测(MPP)是化学、材料、生物学和医学等重要研究领域中的一项基础任务,在这些领域中,基于量子力学的传统计算化学方法通常会消耗大量时间和计算能力。近年来,机器学习在计算化学中得到了越来越广泛的应用,其中图神经网络在分子性质预测任务中表现出了良好的性能,但在泛化性、可解释性和确定性方面存在一些局限性。为了应对上述挑战,本文提出了一种多尺度分子结构神经网络(MMSNet),它通过原子尺度上键合和非键合“消息传递”结构之间的信息融合以及分子尺度上空间特征信息“编码器-解码器”结构来获得丰富的多尺度分子表示;在对分子力学进行理论分析的基础上引入了多级注意力机制,以增强模型的可解释性;将MMSNet的预测结果用作标签值,并通过K近邻(K-NN)算法在分子库中进行聚类,以反向匹配分子的空间结构,并通过比较不同K值的虚拟筛选结果来量化模型的确定性。在权威小分子数据集QM9和大分子蛋白质数据库PDBbind上的实验结果表明,在各种不同类型的预测任务中,与十多种现有的先进(SOTA)模型相比,MMSNet具有最优的预测准确性、模型复杂度和泛化性;它在化学研究、药物发现和材料设计等下游任务中具有巨大潜力。