Department of Information Engineering, University of Padova, via Gradenigo 6/A, Padova, 35131, Italy.
BMC Bioinformatics. 2018 Feb 6;19(1):35. doi: 10.1186/s12859-018-2043-3.
The correct determination of protein-protein interaction interfaces is important for understanding disease mechanisms and for rational drug design. To date, several computational methods for the prediction of protein interfaces have been developed, but the interface prediction problem is still not fully understood. Experimental evidence suggests that the location of binding sites is imprinted in the protein structure, but there are major differences among the interfaces of the various protein types: the characterising properties can vary a lot depending on the interaction type and function. The selection of an optimal set of features characterising the protein interface and the development of an effective method to represent and capture the complex protein recognition patterns are of paramount importance for this task.
In this work we investigate the potential of a novel local surface descriptor based on 3D Zernike moments for the interface prediction task. Descriptors invariant to roto-translations are extracted from circular patches of the protein surface enriched with physico-chemical properties from the HQI8 amino acid index set, and are used as samples for a binary classification problem. Support Vector Machines are used as a classifier to distinguish interface local surface patches from non-interface ones. The proposed method was validated on 16 classes of proteins extracted from the Protein-Protein Docking Benchmark 5.0 and compared to other state-of-the-art protein interface predictors (SPPIDER, PrISE and NPS-HomPPI).
The 3D Zernike descriptors are able to capture the similarity among patterns of physico-chemical and biochemical properties mapped on the protein surface arising from the various spatial arrangements of the underlying residues, and their usage can be easily extended to other sets of amino acid properties. The results suggest that the choice of a proper set of features characterising the protein interface is crucial for the interface prediction task, and that optimality strongly depends on the class of proteins whose interface we want to characterise. We postulate that different protein classes should be treated separately and that it is necessary to identify an optimal set of features for each protein class.
正确确定蛋白质-蛋白质相互作用界面对于理解疾病机制和合理药物设计非常重要。迄今为止,已经开发了几种用于预测蛋白质界面的计算方法,但界面预测问题仍未完全理解。实验证据表明,结合位点的位置在蛋白质结构中被印记,但各种蛋白质类型的界面存在很大差异:特征性质因相互作用类型和功能而异。选择一组最佳的特征来描述蛋白质界面,并开发一种有效的方法来表示和捕获复杂的蛋白质识别模式,对于这项任务至关重要。
在这项工作中,我们研究了基于 3D Zernike 矩的新型局部表面描述符在界面预测任务中的潜力。从 HQI8 氨基酸指数集中提取的蛋白质表面的圆形补丁中提取出对旋转平移不变的描述符,并将其用作二元分类问题的样本。支持向量机被用作分类器来区分界面局部表面补丁和非界面补丁。所提出的方法在从 Protein-Protein Docking Benchmark 5.0 中提取的 16 类蛋白质上进行了验证,并与其他最先进的蛋白质界面预测器(SPPIDER、PrISE 和 NPS-HomPPI)进行了比较。
3D Zernike 描述符能够捕获映射在蛋白质表面上的物理化学和生化特性模式之间的相似性,这些模式源自底层残基的各种空间排列,并且可以轻松将其用于其他氨基酸特性集。结果表明,选择一组正确的特征来描述蛋白质界面对于界面预测任务至关重要,并且最优性强烈取决于我们要描述的界面的蛋白质类别的性质。我们假设不同的蛋白质类别应该分开处理,并且需要为每个蛋白质类别确定一组最佳特征。