TranGRU：聚焦分子的局部和全局信息用于分子性质预测。

TranGRU: focusing on both the local and global information of molecules for molecular property prediction.

作者信息

Jiang Jing, Zhang Ruisheng, Ma Jun, Liu Yunwu, Yang Enjie, Du Shikang, Zhao Zhili, Yuan Yongna

机构信息

School of Information Science and Engineering, Lanzhou University, Tianshui Road, Lanzhou, 730000 Gansu China.

Key Laboratory of China's Ethnic Languages and Information Technology of Ministry of Education, Northwest Minzu University, Baiyin Road, Lanzhou, 730030 Gansu China.

出版信息

Appl Intell (Dordr). 2023;53(12):15246-15260. doi: 10.1007/s10489-022-04280-y. Epub 2022 Nov 14.

DOI:10.1007/s10489-022-04280-y

PMID:36405344

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9662124/

Abstract

Molecular property prediction is an essential but challenging task in drug discovery. The recurrent neural network (RNN) and Transformer are the mainstream methods for sequence modeling, and both have been successfully applied independently for molecular property prediction. As the local information and global information of molecules are very important for molecular properties, we aim to integrate the bi-directional gated recurrent unit (BiGRU) into the original Transformer encoder, together with self-attention to better capture local and global molecular information simultaneously. To this end, we propose the TranGRU approach, which encodes the local and global information of molecules by using the BiGRU and self-attention, respectively. Then, we use a gate mechanism to reasonably fuse the two molecular representations. In this way, we enhance the ability of the proposed model to encode both local and global molecular information. Compared to the baselines and state-of-the-art methods when treating each task as a single-task classification on Tox21, the proposed approach outperforms the baselines on 9 out of 12 tasks and state-of-the-art methods on 5 out of 12 tasks. TranGRU also obtains the best ROC-AUC scores on BBBP, FDA, LogP, and Tox21 (multitask classification) and has a comparable performance on ToxCast, BACE, and ecoli. On the whole, TranGRU achieves better performance for molecular property prediction. The source code is available in GitHub: https://github.com/Jiangjing0122/TranGRU.

摘要

分子性质预测是药物研发中一项至关重要但具有挑战性的任务。循环神经网络（RNN）和Transformer是序列建模的主流方法，二者均已独立成功应用于分子性质预测。由于分子的局部信息和全局信息对分子性质非常重要，我们旨在将双向门控循环单元（BiGRU）集成到原始的Transformer编码器中，并结合自注意力机制，以便能同时更好地捕捉局部和全局分子信息。为此，我们提出了TranGRU方法，该方法分别利用BiGRU和自注意力机制对分子的局部和全局信息进行编码。然后，我们使用门控机制合理融合这两种分子表示。通过这种方式，我们增强了所提模型对局部和全局分子信息的编码能力。在将每个任务视为Tox21上的单任务分类时，与基线方法和当前最优方法相比，所提方法在12个任务中的9个任务上优于基线方法，在12个任务中的5个任务上优于当前最优方法。TranGRU在BBBP、FDA、LogP和Tox21（多任务分类）上也获得了最佳的ROC-AUC分数，并且在ToxCast、BACE和大肠杆菌数据集上具有可比的性能。总体而言，TranGRU在分子性质预测方面取得了更好的性能。源代码可在GitHub上获取：https://github.com/Jiangjing0122/TranGRU 。