Suppr超能文献

基于聚类的 PDZ 肽相互作用预测。

Cluster based prediction of PDZ-peptide interactions.

出版信息

BMC Genomics. 2014;15 Suppl 1(Suppl 1):S5. doi: 10.1186/1471-2164-15-S1-S5. Epub 2014 Jan 24.

Abstract

BACKGROUND

PDZ domains are one of the most promiscuous protein recognition modules that bind with short linear peptides and play an important role in cellular signaling. Recently, few high-throughput techniques (e.g. protein microarray screen, phage display) have been applied to determine in-vitro binding specificity of PDZ domains. Currently, many computational methods are available to predict PDZ-peptide interactions but they often provide domain specific models and/or have a limited domain coverage.

RESULTS

Here, we composed the largest set of PDZ domains derived from human, mouse, fly and worm proteomes and defined binding models for PDZ domain families to improve the domain coverage and prediction specificity. For that purpose, we first identified a novel set of 138 PDZ families, comprising of 548 PDZ domains from aforementioned organisms, based on efficient clustering according to their sequence identity. For 43 PDZ families, covering 226 PDZ domains with available interaction data, we built specialized models using a support vector machine approach. The advantage of family-wise models is that they can also be used to determine the binding specificity of a newly characterized PDZ domain with sufficient sequence identity to the known families. Since most current experimental approaches provide only positive data, we have to cope with the class imbalance problem. Thus, to enrich the negative class, we introduced a powerful semi-supervised technique to generate high confidence non-interaction data. We report competitive predictive performance with respect to state-of-the-art approaches.

CONCLUSIONS

Our approach has several contributions. First, we show that domain coverage can be increased by applying accurate clustering technique. Second, we developed an approach based on a semi-supervised strategy to get high confidence negative data. Third, we allowed high order correlations between the amino acid positions in the binding peptides. Fourth, our method is general enough and will easily be applicable to other peptide recognition modules such as SH2 domains and finally, we performed a genome-wide prediction for 101 human and 102 mouse PDZ domains and uncovered novel interactions with biological relevance. We make all the predictive models and genome-wide predictions freely available to the scientific community.

摘要

背景

PDZ 结构域是最混杂的蛋白质识别模块之一,它与短线性肽结合,在细胞信号转导中发挥重要作用。最近,一些高通量技术(如蛋白质微阵列筛选、噬菌体展示)已被应用于确定 PDZ 结构域的体外结合特异性。目前,有许多计算方法可用于预测 PDZ-肽相互作用,但它们通常提供特定于结构域的模型和/或具有有限的结构域覆盖范围。

结果

在这里,我们构建了最大的一组来自人类、小鼠、果蝇和线虫蛋白质组的 PDZ 结构域,并为 PDZ 结构域家族定义了结合模型,以提高结构域覆盖范围和预测特异性。为此,我们首先根据序列同一性进行有效聚类,从上述生物体中识别出一组新的 138 个 PDZ 家族,包括 548 个 PDZ 结构域。对于 43 个 PDZ 家族,涵盖了 226 个具有可用相互作用数据的 PDZ 结构域,我们使用支持向量机方法构建了专门的模型。家族模型的优势在于,它们还可用于确定与已知家族具有足够序列同一性的新表征 PDZ 结构域的结合特异性。由于目前大多数实验方法仅提供阳性数据,因此我们必须应对类别不平衡问题。因此,为了丰富阴性类别,我们引入了一种强大的半监督技术来生成高置信度的非相互作用数据。我们报告了相对于最先进方法的竞争预测性能。

结论

我们的方法有几个贡献。首先,我们表明通过应用精确的聚类技术可以增加结构域的覆盖范围。其次,我们开发了一种基于半监督策略的方法来获取高置信度的阴性数据。第三,我们允许结合肽中的氨基酸位置之间存在高阶相关性。第四,我们的方法足够通用,并且可以轻松应用于其他肽识别模块,如 SH2 结构域。最后,我们对 101 个人类和 102 个小鼠 PDZ 结构域进行了全基因组预测,并发现了具有生物学相关性的新相互作用。我们将所有预测模型和全基因组预测免费提供给科学界。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e63f/4046824/bfaf601b9b4b/12864_2014_5678_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验