Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University.
Key Laboratory of System Control and Information Processing, Ministry of Education of China, 200240 Shanghai, China.
Bioinformatics. 2020 May 1;36(10):3018-3027. doi: 10.1093/bioinformatics/btaa110.
Knowledge of protein-ligand binding residues is important for understanding the functions of proteins and their interaction mechanisms. From experimentally solved protein structures, how to accurately identify its potential binding sites of a specific ligand on the protein is still a challenging problem. Compared with structure-alignment-based methods, machine learning algorithms provide an alternative flexible solution which is less dependent on annotated homogeneous protein structures. Several factors are important for an efficient protein-ligand prediction model, e.g. discriminative feature representation and effective learning architecture to deal with both the large-scale and severely imbalanced data.
In this study, we propose a novel deep-learning-based method called DELIA for protein-ligand binding residue prediction. In DELIA, a hybrid deep neural network is designed to integrate 1D sequence-based features with 2D structure-based amino acid distance matrices. To overcome the problem of severe data imbalance between the binding and nonbinding residues, strategies of oversampling in mini-batch, random undersampling and stacking ensemble are designed to enhance the model. Experimental results on five benchmark datasets demonstrate the effectiveness of proposed DELIA pipeline.
The web server of DELIA is available at www.csbio.sjtu.edu.cn/bioinf/delia/.
Supplementary data are available at Bioinformatics online.
了解蛋白质-配体结合残基对于理解蛋白质的功能及其相互作用机制非常重要。从实验确定的蛋白质结构中,如何准确识别其潜在的结合特定配体的蛋白质结合位点仍然是一个具有挑战性的问题。与基于结构比对的方法相比,机器学习算法提供了一种替代的灵活解决方案,其对带注释的同源蛋白质结构的依赖性较小。对于有效的蛋白质-配体预测模型,有几个因素很重要,例如有区分力的特征表示和有效的学习架构,以处理大规模和严重不平衡的数据。
在这项研究中,我们提出了一种称为 DELIA 的基于深度学习的新方法,用于蛋白质-配体结合残基预测。在 DELIA 中,设计了一种混合深度神经网络,将 1D 基于序列的特征与 2D 基于结构的氨基酸距离矩阵集成在一起。为了克服结合残基和非结合残基之间严重的数据不平衡问题,设计了在 mini-batch 中过采样、随机欠采样和堆叠集成的策略来增强模型。在五个基准数据集上的实验结果证明了所提出的 DELIA 管道的有效性。
DELIA 的网络服务器可在 www.csbio.sjtu.edu.cn/bioinf/delia/ 获得。
补充数据可在 Bioinformatics 在线获得。