UCD Complex and Adaptive Systems Laboratory, University College Dublin, Dublin, Ireland.
Front Biosci (Landmark Ed). 2010 Jun 1;15(3):801-25. doi: 10.2741/3647.
Short linear motifs (SLiMs) in proteins can act as targets for proteolytic cleavage, sites of post-translational modification, determinants of sub-cellular localization, and mediators of protein-protein interactions. Computational discovery of SLiMs involves assembling a group of proteins postulated to share a potential motif, masking out residues less likely to contain such a motif, down-weighting shared motifs arising through common evolutionary descent, and calculation of statistical probabilities allowing for the multiple testing of all possible motifs. Much of the challenge for motif discovery lies in the assembly and masking of datasets of proteins likely to share motifs, since the motifs are typically short (between 3 and 10 amino acids in length), so that potential signals can be easily swamped by the noise of stochastically recurring motifs. Focusing on disordered regions of proteins, where SLiMs are predominantly found, and masking out non-conserved residues can reduce the level of noise but more work is required to improve the quality of high-throughput experimental datasets (e.g. of physical protein interactions) as input for computational discovery.
短线性基序 (SLiMs) 存在于蛋白质中,可以作为蛋白水解切割的靶点、翻译后修饰的位点、亚细胞定位的决定因素以及蛋白质-蛋白质相互作用的介质。计算发现 SLiMs 涉及将一组假定具有潜在基序的蛋白质进行组装,屏蔽掉不太可能包含这种基序的残基,减轻通过共同进化产生的共享基序,以及计算统计概率,允许对所有可能的基序进行多次测试。 motif 发现的主要挑战在于对可能共享 motif 的蛋白质数据集进行组装和屏蔽,因为 motif 通常很短(长度在 3 到 10 个氨基酸之间),因此潜在信号很容易被随机重复出现的 motif 的噪声淹没。关注蛋白质的无序区域,这里主要存在 SLiMs,屏蔽不保守的残基可以降低噪声水平,但需要做更多的工作来提高高通量实验数据集(例如物理蛋白质相互作用)的质量,作为计算发现的输入。