Department of Computer Engineering, Middle East Technical University, Ankara, Turkey.
Department of Computer Engineering, İskenderun Technical University, Hatay, Turkey.
Bioinformatics. 2021 May 5;37(5):693-704. doi: 10.1093/bioinformatics/btaa858.
Identification of interactions between bioactive small molecules and target proteins is crucial for novel drug discovery, drug repurposing and uncovering off-target effects. Due to the tremendous size of the chemical space, experimental bioactivity screening efforts require the aid of computational approaches. Although deep learning models have been successful in predicting bioactive compounds, effective and comprehensive featurization of proteins, to be given as input to deep neural networks, remains a challenge.
Here, we present a novel protein featurization approach to be used in deep learning-based compound-target protein binding affinity prediction. In the proposed method, multiple types of protein features such as sequence, structural, evolutionary and physicochemical properties are incorporated within multiple 2D vectors, which is then fed to state-of-the-art pairwise input hybrid deep neural networks to predict the real-valued compound-target protein interactions. The method adopts the proteochemometric approach, where both the compound and target protein features are used at the input level to model their interaction. The whole system is called MDeePred and it is a new method to be used for the purposes of computational drug discovery and repositioning. We evaluated MDeePred on well-known benchmark datasets and compared its performance with the state-of-the-art methods. We also performed in vitro comparative analysis of MDeePred predictions with selected kinase inhibitors' action on cancer cells. MDeePred is a scalable method with sufficiently high predictive performance. The featurization approach proposed here can also be utilized for other protein-related predictive tasks.
The source code, datasets, additional information and user instructions of MDeePred are available at https://github.com/cansyl/MDeePred.
Supplementary data are available at Bioinformatics online.
鉴定生物活性小分子与靶蛋白之间的相互作用对于新药发现、药物重定位和揭示脱靶效应至关重要。由于化学空间的巨大规模,实验性生物活性筛选工作需要计算方法的辅助。尽管深度学习模型在预测生物活性化合物方面取得了成功,但有效地全面地对蛋白质进行特征化,以便作为输入提供给深度神经网络,仍然是一个挑战。
在这里,我们提出了一种新的蛋白质特征化方法,用于基于深度学习的化合物-靶蛋白结合亲和力预测。在所提出的方法中,多种类型的蛋白质特征,如序列、结构、进化和物理化学性质,被合并到多个 2D 向量中,然后将其输入到最先进的成对输入混合深度神经网络中,以预测真实值化合物-靶蛋白相互作用。该方法采用了基于蛋白质化学计量学的方法,在输入级别同时使用化合物和靶蛋白特征来模拟它们的相互作用。整个系统称为 MDeePred,是一种用于计算药物发现和重新定位的新方法。我们在著名的基准数据集上评估了 MDeePred,并将其性能与最先进的方法进行了比较。我们还对 MDeePred 预测与选定的激酶抑制剂对癌细胞的作用进行了体外比较分析。MDeePred 是一种具有足够高预测性能的可扩展方法。这里提出的特征化方法也可以用于其他与蛋白质相关的预测任务。
MDeePred 的源代码、数据集、附加信息和用户说明可在 https://github.com/cansyl/MDeePred 上获得。
补充数据可在 Bioinformatics 在线获得。