Donnelly Center for Cellular and Biomolecular Research, Banting and Best Department of Medical Research, University of Toronto, Toronto ON, Canada.
BMC Bioinformatics. 2010 Oct 12;11:507. doi: 10.1186/1471-2105-11-507.
PDZ domains mediate protein-protein interactions involved in important biological processes through the recognition of short linear motifs in their target proteins. Two recent independent studies have used protein microarray or phage display technology to detect PDZ domain interactions with peptide ligands on a large scale. Several computational predictors of PDZ domain interactions have been developed, however they are trained using only protein microarray data and focus on limited subsets of PDZ domains. An accurate predictor of genomic PDZ domain interactions would allow the proteomes of organisms to be scanned for potential binders. Such an application would require an accurate and precise predictor to avoid generating too many false positive hits given the large amount of possible interactors in a given proteome. Once validated these predictions will help to increase the coverage of current PDZ domain interaction networks and further our understanding of the roles that PDZ domains play in a variety of biological processes.
We developed a PDZ domain interaction predictor using a support vector machine (SVM) trained with both protein microarray and phage display data. In order to use the phage display data for training, which only contains positive interactions, we developed a method to generate artificial negative interactions. Using cross-validation and a series of independent tests, we showed that our SVM successfully predicts interactions in different organisms. We then used the SVM to scan the proteomes of human, worm and fly to predict binders for several PDZ domains. Predictions were validated using known genomic interactions and published protein microarray experiments. Based on our results, new protein interactions potentially associated with Usher and Bardet-Biedl syndromes were predicted. A comparison of performance measures (F1 measure and FPR) for the SVM and published predictors demonstrated our SVM's improved accuracy and precision at proteome scanning.
We built an SVM using mouse and human experimental training data to predict PDZ domain interactions. We showed that it correctly predicts known interactions from proteomes of different organisms and is more accurate and precise at proteome scanning compared with published state-of-the-art predictors.
PDZ 结构域通过识别其靶蛋白中的短线性基序,介导参与重要生物过程的蛋白质-蛋白质相互作用。最近的两项独立研究使用蛋白质微阵列或噬菌体展示技术大规模检测 PDZ 结构域与肽配体的相互作用。已经开发了几种 PDZ 结构域相互作用的计算预测器,但是它们仅使用蛋白质微阵列数据进行训练,并且仅关注 PDZ 结构域的有限子集。基因组 PDZ 结构域相互作用的准确预测器将允许扫描生物体的蛋白质组以寻找潜在的结合物。这种应用需要一个准确和精确的预测器,以避免在给定的蛋白质组中产生太多的假阳性命中。一旦验证,这些预测将有助于增加当前 PDZ 结构域相互作用网络的覆盖范围,并进一步了解 PDZ 结构域在各种生物过程中的作用。
我们使用支持向量机 (SVM) 开发了一种 PDZ 结构域相互作用预测器,该预测器使用蛋白质微阵列和噬菌体展示数据进行训练。为了使用仅包含阳性相互作用的噬菌体展示数据进行训练,我们开发了一种生成人工阴性相互作用的方法。通过交叉验证和一系列独立测试,我们表明我们的 SVM 成功地预测了不同生物体中的相互作用。然后,我们使用 SVM 扫描人类、蠕虫和苍蝇的蛋白质组,以预测几种 PDZ 结构域的结合物。使用已知的基因组相互作用和已发表的蛋白质微阵列实验对预测结果进行了验证。基于我们的结果,预测了与 Usher 和 Bardet-Biedl 综合征相关的新的蛋白质相互作用。SVM 和已发表的预测器的性能指标(F1 度量和 FPR)的比较表明,我们的 SVM 在蛋白质组扫描方面具有更高的准确性和精度。
我们使用小鼠和人类实验训练数据构建了一个 SVM 来预测 PDZ 结构域相互作用。我们表明,它可以正确预测来自不同生物体蛋白质组的已知相互作用,并且在蛋白质组扫描方面比已发表的最先进的预测器更准确和精确。