Sreenivasan Akshai P, Harrison Philip J, Schaal Wesley, Matuszewski Damian J, Kultima Kim, Spjuth Ola
Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden.
Department of Medical Sciences, Uppsala University, Uppsala, Sweden.
J Cheminform. 2022 Jul 15;14(1):47. doi: 10.1186/s13321-022-00622-7.
Comparing chemical structures to infer protein targets and functions is a common approach, but basing comparisons on chemical similarity alone can be misleading. Here we present a methodology for predicting target protein clusters using deep neural networks. The model is trained on clusters of compounds based on similarities calculated from combined compound-protein and protein-protein interaction data using a network topology approach. We compare several deep learning architectures including both convolutional and recurrent neural networks. The best performing method, the recurrent neural network architecture MolPMoFiT, achieved an F1 score approaching 0.9 on a held-out test set of 8907 compounds. In addition, in-depth analysis on a set of eleven well-studied chemical compounds with known functions showed that predictions were justifiable for all but one of the chemicals. Four of the compounds, similar in their molecular structure but with dissimilarities in their function, revealed advantages of our method compared to using chemical similarity.
通过比较化学结构来推断蛋白质靶点和功能是一种常用方法,但仅基于化学相似性进行比较可能会产生误导。在此,我们提出一种使用深度神经网络预测目标蛋白簇的方法。该模型基于使用网络拓扑方法从化合物 - 蛋白质和蛋白质 - 蛋白质相互作用数据计算出的相似性,对化合物簇进行训练。我们比较了几种深度学习架构,包括卷积神经网络和循环神经网络。表现最佳的方法,即循环神经网络架构MolPMoFiT,在8907种化合物的保留测试集上的F1分数接近0.9。此外,对一组11种具有已知功能且经过充分研究的化合物进行的深入分析表明,除一种化合物外,对所有化合物的预测都是合理的。其中四种化合物,分子结构相似但功能不同,显示出我们的方法相比于使用化学相似性的优势。