Tang Bowen, Kramer Skyler T, Fang Meijuan, Qiu Yingkun, Wu Zhen, Xu Dong
Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Xiamen University, Xiamen, 361000, China.
Department of Electrical Engineering and Computer Science, Informatics Institute, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA.
J Cheminform. 2020 Feb 21;12(1):15. doi: 10.1186/s13321-020-0414-z.
Efficient and accurate prediction of molecular properties, such as lipophilicity and solubility, is highly desirable for rational compound design in chemical and pharmaceutical industries. To this end, we build and apply a graph-neural-network framework called self-attention-based message-passing neural network (SAMPN) to study the relationship between chemical properties and structures in an interpretable way. The main advantages of SAMPN are that it directly uses chemical graphs and breaks the black-box mold of many machine/deep learning methods. Specifically, its attention mechanism indicates the degree to which each atom of the molecule contributes to the property of interest, and these results are easily visualized. Further, SAMPN outperforms random forests and the deep learning framework MPN from Deepchem. In addition, another formulation of SAMPN (Multi-SAMPN) can simultaneously predict multiple chemical properties with higher accuracy and efficiency than other models that predict one specific chemical property. Moreover, SAMPN can generate chemically visible and interpretable results, which can help researchers discover new pharmaceuticals and materials. The source code of the SAMPN prediction pipeline is freely available at Github (https://github.com/tbwxmu/SAMPN).
在化学和制药行业中,高效准确地预测分子性质(如亲脂性和溶解度)对于合理的化合物设计非常重要。为此,我们构建并应用了一个名为基于自注意力的消息传递神经网络(SAMPN)的图神经网络框架,以一种可解释的方式研究化学性质与结构之间的关系。SAMPN的主要优点在于它直接使用化学图,打破了许多机器学习/深度学习方法的黑箱模式。具体而言,其注意力机制表明了分子中每个原子对目标性质的贡献程度,并且这些结果易于可视化。此外,SAMPN优于随机森林和Deepchem的深度学习框架MPN。另外,SAMPN的另一种形式(多SAMPN)能够同时预测多种化学性质,比其他预测单一特定化学性质的模型具有更高的准确性和效率。而且,SAMPN可以生成化学上可见且可解释的结果,这有助于研究人员发现新的药物和材料。SAMPN预测管道的源代码可在Github(https://github.com/tbwxmu/SAMPN)上免费获取。