Biomedical Informatics Research Laboratory (BIRL), Department of Computer and Information Sciences (DCIS), Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, ISL, 45650, Pakistan.
Information Technology Center (ITC), University of Azad Jammu & Kashmir, Muzaffarabad, Azad Kashmir, 13100, Pakistan.
BMC Bioinformatics. 2018 Nov 15;19(1):425. doi: 10.1186/s12859-018-2448-z.
Determining protein-protein interactions and their binding affinity are important in understanding cellular biological processes, discovery and design of novel therapeutics, protein engineering, and mutagenesis studies. Due to the time and effort required in wet lab experiments, computational prediction of binding affinity from sequence or structure is an important area of research. Structure-based methods, though more accurate than sequence-based techniques, are limited in their applicability due to limited availability of protein structure data.
In this study, we propose a novel machine learning method for predicting binding affinity that uses protein 3D structure as privileged information at training time while expecting only protein sequence information during testing. Using the method, which is based on the framework of learning using privileged information (LUPI), we have achieved improved performance over corresponding sequence-based binding affinity prediction methods that do not have access to privileged information during training. Our experiments show that with the proposed framework which uses structure only during training, it is possible to achieve classification performance comparable to that which is obtained using structure-based features. Evaluation on an independent test set shows improved performance over the PPA-Pred2 method as well.
The proposed method outperforms several baseline learners and a state-of-the-art binding affinity predictor not only in cross-validation, but also on an additional validation dataset, demonstrating the utility of the LUPI framework for problems that would benefit from classification using structure-based features. The implementation of LUPI developed for this work is expected to be useful in other areas of bioinformatics as well.
确定蛋白质-蛋白质相互作用及其结合亲和力对于理解细胞生物学过程、新型治疗药物的发现和设计、蛋白质工程以及诱变研究非常重要。由于在湿实验室实验中需要耗费大量的时间和精力,因此从序列或结构预测结合亲和力是一个重要的研究领域。基于结构的方法虽然比基于序列的技术更准确,但由于蛋白质结构数据的有限可用性,其适用性受到限制。
在这项研究中,我们提出了一种新的机器学习方法,用于预测结合亲和力,该方法在训练时使用蛋白质 3D 结构作为特权信息,而在测试时仅期望使用蛋白质序列信息。使用该方法,该方法基于使用特权信息的学习框架(LUPI),我们在不使用训练期间特权信息的情况下,实现了比相应的基于序列的结合亲和力预测方法更好的性能。我们的实验表明,使用仅在训练期间使用结构的框架,有可能实现与使用基于结构的特征获得的分类性能相当的性能。在独立测试集上的评估也表明,该方法优于 PPA-Pred2 方法。
该方法不仅在交叉验证中,而且在额外的验证数据集上,均优于几个基准学习者和最先进的结合亲和力预测器,证明了 LUPI 框架对于受益于基于结构的特征分类的问题的实用性。为这项工作开发的 LUPI 的实现有望在其他生物信息学领域也很有用。