Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA), Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, 41013, Sevilla, Spain.
BMC Bioinformatics. 2013 Jul 18;14:229. doi: 10.1186/1471-2105-14-229.
Most proteins have evolved in specific cellular compartments that limit their functions and potential interactions. On the other hand, motifs define amino acid arrangements conserved between protein family members and represent powerful tools for assigning function to protein sequences. The ideal motif would identify all members of a protein family but in practice many motifs identify both family members and unrelated proteins, referred to as True Positive (TP) and False Positive (FP) sequences, respectively.
To address the relationship between protein motifs, protein function and cellular localization, we systematically assigned subcellular localization data to motif sequences from the comprehensive PROSITE sequence motif database. Using this data we analyzed relationships between localization and function. We find that TPs and FPs have a strong tendency to localize in different compartments. When multiple localizations are considered, TPs are usually distributed between related cellular compartments. We also identified cases where FPs are concentrated in particular subcellular regions, indicating possible functional or evolutionary relationships with TP sequences of the same motif.
Our findings suggest that the systematic examination of subcellular localization has the potential to uncover evolutionary and functional relationships between motif-containing sequences. We believe that this type of analysis complements existing motif annotations and could aid in their interpretation. Our results shed light on the evolution of cellular organelles and potentially establish the basis for new subcellular localization and function prediction algorithms.
大多数蛋白质在特定的细胞隔室中进化,这限制了它们的功能和潜在相互作用。另一方面,基序定义了在蛋白质家族成员之间保守的氨基酸排列,是为蛋白质序列分配功能的强大工具。理想的基序应该能够识别蛋白质家族的所有成员,但实际上许多基序既可以识别家族成员,也可以识别不相关的蛋白质,分别称为真阳性 (TP) 和假阳性 (FP) 序列。
为了解决蛋白质基序、蛋白质功能和细胞定位之间的关系,我们系统地将细胞定位数据分配给来自全面的 PROSITE 序列基序数据库中的基序序列。使用此数据,我们分析了定位与功能之间的关系。我们发现 TP 和 FP 强烈倾向于在不同的隔室中定位。当考虑多个定位时,TP 通常分布在相关的细胞隔室之间。我们还确定了 FP 集中在特定亚细胞区域的情况,表明与同一基序的 TP 序列存在可能的功能或进化关系。
我们的发现表明,系统地检查亚细胞定位有可能揭示基序包含序列之间的进化和功能关系。我们相信这种类型的分析补充了现有的基序注释,并有助于对其进行解释。我们的结果揭示了细胞器官的进化,并可能为新的亚细胞定位和功能预测算法奠定基础。