Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, MSN 3030, Kansas City, Kansas 66160, USA.
Proteins. 2011 May;79(5):1589-608. doi: 10.1002/prot.22985. Epub 2011 Mar 4.
Concomitant with the genomic era, many bioinformatics programs have been developed to identify functionally important positions from sequence alignments of protein families. To evaluate these analyses, many have used the LacI/GalR family and determined whether positions predicted to be "important" are validated by published experiments. However, we previously noted that predictions do not identify all of the experimentally important positions present in the linker regions of these homologs. In an attempt to reconcile these differences, we corrected and expanded the LacI/GalR sequence set commonly used in sequence/function analyses. Next, a variety of analyses were carried out (1) for the entire LacI/GalR sequence set and (2) for a subset of homologs with functionally-important "YxPxxxAxxL" motifs in their linkers. This strategy was devised to determine whether predictions could be improved by knowledge-based sequence sorting and-for some analyses-did increase the number of linker positions identified. However, two functionally important linker positions were not reliably identified by any analysis. Finally, we compared the new predictions to all known experimental data for E. coli LacI and three homologous linkers. From these, we estimate that >50% of positions are important to the functions of the LacI/GalR homologs. In corollary, neutral positions might occur less frequently and might be easier to detect in sequence analyses. Although analyses have successfully guided mutations that partially exchange protein functions, a better experimental understanding of the sequence/function relationships in protein families would be helpful for uncovering the remaining rules used by nature to evolve new protein functions.
伴随基因组时代的到来,许多生物信息学程序已经被开发出来,用于从蛋白质家族的序列比对中识别功能重要的位置。为了评估这些分析,许多人使用了 LacI/GalR 家族,并确定了预测为“重要”的位置是否被已发表的实验所验证。然而,我们之前注意到,预测并不能识别这些同源物连接区中存在的所有实验上重要的位置。为了调和这些差异,我们纠正并扩展了常用于序列/功能分析的 LacI/GalR 序列集。接下来,进行了各种分析(1)针对整个 LacI/GalR 序列集,以及(2)针对其连接区中具有功能重要的“YxPxxxAxxL”模体的同源物的子集。该策略旨在确定基于知识的序列排序是否可以改善预测,并且对于某些分析,确实增加了鉴定连接区位置的数量。然而,有两个功能上重要的连接区位置无法被任何分析可靠地识别。最后,我们将新的预测与大肠杆菌 LacI 和三个同源连接区的所有已知实验数据进行了比较。从中,我们估计>50%的位置对 LacI/GalR 同源物的功能很重要。因此,中性位置可能发生的频率较低,并且在序列分析中更容易检测到。尽管分析已经成功地指导了部分交换蛋白质功能的突变,但对蛋白质家族中序列/功能关系的更好的实验理解将有助于揭示自然用来进化新蛋白质功能的剩余规则。