Center for Computational Biology and Bioinformatics, College of Engineering, Koc University, Rumelifeneri Yolu 34450 Sariyer, Istanbul, Turkey.
BMC Bioinformatics. 2010 Jun 30;11:357. doi: 10.1186/1471-2105-11-357.
PDZ domain is a well-conserved, structural protein domain found in hundreds of signaling proteins that are otherwise unrelated. PDZ domains can bind to the C-terminal peptides of different proteins and act as glue, clustering different protein complexes together, targeting specific proteins and routing these proteins in signaling pathways. These domains are classified into classes I, II and III, depending on their binding partners and the nature of bonds formed. Binding specificities of PDZ domains are very crucial in order to understand the complexity of signaling pathways. It is still an open question how these domains recognize and bind their partners.
The focus of the current study is two folds: 1) predicting to which peptides a PDZ domain will bind and 2) classification of PDZ domains, as Class I, II or I-II, given the primary sequences of the PDZ domains. Trigram and bigram amino acid frequencies are used as features in machine learning methods. Using 85 PDZ domains and 181 peptides, our model reaches high prediction accuracy (91.4%) for binary interaction prediction which outperforms previously investigated similar methods. Also, we can predict classes of PDZ domains with an accuracy of 90.7%. We propose three critical amino acid sequence motifs that could have important roles on specificity pattern of PDZ domains.
Our model on PDZ interaction dataset shows that our approach produces encouraging results. The method can be further used as a virtual screening technique to reduce the search space for putative candidate target proteins and drug-like molecules of PDZ domains.
PDZ 结构域是一种高度保守的、结构性蛋白结构域,存在于数百种信号蛋白中,这些蛋白之间没有任何关系。PDZ 结构域可以与不同蛋白质的 C 末端肽结合,并充当胶,将不同的蛋白质复合物聚集在一起,靶向特定的蛋白质,并将这些蛋白质路由到信号通路中。这些结构域分为 I、II 和 III 类,这取决于它们的结合伙伴和形成的键的性质。为了理解信号通路的复杂性,PDZ 结构域的结合特异性非常关键。这些结构域如何识别和结合它们的伙伴仍然是一个悬而未决的问题。
目前研究的重点有两个方面:1)预测 PDZ 结构域将与哪些肽结合;2)给定 PDZ 结构域的一级序列,将 PDZ 结构域分类为 I 类、II 类或 I-II 类。三肽和二肽氨基酸频率被用作机器学习方法的特征。使用 85 个 PDZ 结构域和 181 个肽,我们的模型对二元相互作用预测达到了 91.4%的高预测准确性,优于之前研究的类似方法。此外,我们可以以 90.7%的准确率预测 PDZ 结构域的类别。我们提出了三个关键的氨基酸序列基序,它们可能对 PDZ 结构域的特异性模式具有重要作用。
我们在 PDZ 相互作用数据集上的模型表明,我们的方法产生了令人鼓舞的结果。该方法可以进一步用作虚拟筛选技术,以缩小 PDZ 结构域的假定候选靶蛋白和类药分子的搜索空间。