Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.
BMC Bioinformatics. 2010 May 11;11:243. doi: 10.1186/1471-2105-11-243.
Many protein interactions, especially those involved in signaling, involve short linear motifs consisting of 5-10 amino acid residues that interact with modular protein domains such as the SH3 binding domains and the kinase catalytic domains. One straightforward way of identifying these interactions is by scanning for matches to the motif against all the sequences in a target proteome. However, predicting domain targets by motif sequence alone without considering other genomic and structural information has been shown to be lacking in accuracy.
We developed an efficient search algorithm to scan the target proteome for potential domain targets and to increase the accuracy of each hit by integrating a variety of pre-computed features, such as conservation, surface propensity, and disorder. The integration is performed using naïve Bayes and a training set of validated experiments.
By integrating a variety of biologically relevant features to predict domain targets, we demonstrated a notably improved prediction of modular protein domain targets. Combined with emerging high-resolution data of domain specificities, we believe that our approach can assist in the reconstruction of many signaling pathways.
许多蛋白质相互作用,特别是那些涉及信号转导的相互作用,涉及由 5-10 个氨基酸残基组成的短线性基序,这些基序与模块化蛋白质结构域(如 SH3 结合结构域和激酶催化结构域)相互作用。识别这些相互作用的一种直接方法是通过将基序与目标蛋白质组中的所有序列进行匹配来扫描。然而,仅通过基序序列预测结构域靶标而不考虑其他基因组和结构信息,已被证明准确性不足。
我们开发了一种有效的搜索算法来扫描目标蛋白质组中潜在的结构域靶标,并通过整合各种预先计算的特征(如保守性、表面倾向和无序性)来提高每个命中的准确性。整合是使用朴素贝叶斯和一组经过验证的实验的训练集来完成的。
通过整合各种与生物学相关的特征来预测结构域靶标,我们展示了对模块化蛋白质结构域靶标的显著改进预测。结合新兴的高分辨率特定结构域数据,我们相信我们的方法可以协助重建许多信号通路。