College of Computer Science and Technology, Qingdao University, Qingdao 266071, China.
Biomolecules. 2023 Mar 9;13(3):503. doi: 10.3390/biom13030503.
Molecular property prediction is an important direction in computer-aided drug design. In this paper, to fully explore the information from SMILE stings and graph data of molecules, we combined the SALSTM and GAT methods in order to mine the feature information of molecules from sequences and graphs. The embedding atoms are obtained through SALSTM, firstly using SMILES strings, and they are combined with graph node features and fed into the GAT to extract the global molecular representation. At the same time, data augmentation is added to enlarge the training dataset and improve the performance of the model. Finally, to enhance the interpretability of the model, the attention layers of both models are fused together to highlight the key atoms. Comparison with other graph-based and sequence-based methods, for multiple datasets, shows that our method can achieve high prediction accuracy with good generalizability.
分子性质预测是计算机辅助药物设计中的一个重要方向。在本文中,为了充分挖掘 SMILE 字符串和分子图数据中的信息,我们结合了 SALSTM 和 GAT 方法,以便从序列和图中挖掘分子的特征信息。首先通过 SALSTM 使用 SMILES 字符串获取嵌入原子,然后将它们与图节点特征结合起来,并将其输入到 GAT 中以提取全局分子表示。同时,添加数据增强来扩大训练数据集,提高模型的性能。最后,为了增强模型的可解释性,将两个模型的注意力层融合在一起,突出关键原子。与其他基于图和基于序列的方法相比,针对多个数据集的实验结果表明,我们的方法可以在具有良好泛化能力的情况下实现高精度预测。