Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY 10461, USA.
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad510.
Molecular-level classification of protein-protein interfaces can greatly assist in functional characterization and rational drug design. The most accurate protein interface predictions rely on finding homologous proteins with known interfaces since most interfaces are conserved within the same protein family. The accuracy of these template-based prediction approaches depends on the correct choice of suitable templates. Choosing the right templates in the immunoglobulin superfamily (IgSF) is challenging because its members share low sequence identity and display a wide range of alternative binding sites despite structural homology.
We present a new approach to predict protein interfaces. First, template-specific, informative evolutionary profiles are established using a mutual information-based approach. Next, based on the similarity of residue level conservation scores derived from the evolutionary profiles, a query protein is hierarchically clustered with all available template proteins in its superfamily with known interface definitions. Once clustered, a subset of the most closely related templates is selected, and an interface prediction is made. These initial interface predictions are subsequently refined by extensive docking. This method was benchmarked on 51 IgSF proteins and can predict nontrivial interfaces of IgSF proteins with an average and median F-score of 0.64 and 0.78, respectively. We also provide a way to assess the confidence of the results. The average and median F-scores increase to 0.8 and 0.81, respectively, if 27% of low confidence cases and 17% of medium confidence cases are removed. Lastly, we provide residue level interface predictions, protein complexes, and confidence measurements for singletons in the IgSF.
Source code is freely available at: https://gitlab.com/fiserlab.org/interdct_with_refinement.
蛋白质-蛋白质界面的分子水平分类可以极大地辅助功能特征分析和合理药物设计。最准确的蛋白质界面预测依赖于找到具有已知界面的同源蛋白,因为大多数界面在同一蛋白质家族内是保守的。这些基于模板的预测方法的准确性取决于合适模板的正确选择。在免疫球蛋白超家族(IgSF)中选择正确的模板具有挑战性,因为其成员的序列同一性较低,并且尽管结构同源,但显示出广泛的替代结合位点。
我们提出了一种新的蛋白质界面预测方法。首先,使用基于互信息的方法建立模板特异性、信息丰富的进化轮廓。接下来,根据从进化轮廓中得出的残基水平保守得分的相似性,将查询蛋白质与具有已知界面定义的所有可用模板蛋白质在其超家族中进行层次聚类。聚类后,选择一组最相关的模板子集,并进行界面预测。这些初始界面预测随后通过广泛的对接进行细化。该方法在 51 个 IgSF 蛋白上进行了基准测试,可以预测 IgSF 蛋白的非平凡界面,平均和中位数 F-score 分别为 0.64 和 0.78。我们还提供了一种评估结果置信度的方法。如果删除 27%的低置信度案例和 17%的中置信度案例,则平均和中位数 F-score 分别增加到 0.8 和 0.81。最后,我们为 IgSF 中的单例提供了残基水平的界面预测、蛋白质复合物和置信度测量。
源代码可在以下网址免费获取:https://gitlab.com/fiserlab.org/interdct_with_refinement。