Davey Norman E, Shields Denis C, Edwards Richard J
UCD Complex and Adaptive Systems Laboratory, UCD Conway Institute of Biomolecular and Biomedical Sciences, University College Dublin, Dublin 4, Ireland.
Bioinformatics. 2009 Feb 15;25(4):443-50. doi: 10.1093/bioinformatics/btn664. Epub 2009 Jan 9.
Short linear motifs (SLiMs) are important mediators of protein-protein interactions. Their short and degenerate nature presents a challenge for computational discovery. We sought to improve SLiM discovery by incorporating evolutionary information, since SLiMs are more conserved than surrounding residues.
We have developed a new method that assesses the evolutionary signal of a residue in its sequence and structural context. Under-conserved residues are masked out prior to SLiM discovery, allowing incorporation into the existing statistical model employed by SLiMFinder. The method shows considerable robustness in terms of both the conservation score used for individual residues and the size of the sequence neighbourhood. Optimal parameters significantly improve return of known functional motifs from benchmarking data, raising the return of significant validated SLiMs from typical human interaction datasets from 20% to 60%, while retaining the high level of stringency needed for application to real biological data. The success of this regime indicates that it could be of general benefit to computational annotation and prediction of protein function at the sequence level.
All data and tools in this article are available at http://bioware.ucd.ie/~slimdisc/slimfinder/conmasking/.
短线性基序(SLiMs)是蛋白质-蛋白质相互作用的重要介导因子。它们的短序列和简并性给计算发现带来了挑战。由于SLiMs比周围残基更保守,我们试图通过纳入进化信息来改进SLiM的发现。
我们开发了一种新方法,该方法在序列和结构背景下评估残基的进化信号。在发现SLiM之前,将保守性不足的残基屏蔽掉,从而能够纳入SLiMFinder所采用的现有统计模型。该方法在用于单个残基的保守性评分以及序列邻域大小方面都表现出相当的稳健性。最佳参数显著提高了基准数据中已知功能基序的回收率,将来自典型人类相互作用数据集的显著验证的SLiMs的回收率从20%提高到60%,同时保持了应用于真实生物学数据所需的高严格度。这种方法的成功表明,它可能对序列水平上蛋白质功能的计算注释和预测具有普遍益处。
本文中的所有数据和工具可在http://bioware.ucd.ie/~slimdisc/slimfinder/conmasking/获取。