Pawłowski K, Jaroszewski L, Rychlewski L, Godzik A
Burnham Institute, La Jolla, CA 92037, USA.
Pac Symp Biocomput. 2000:42-53. doi: 10.1142/9789814447331_0005.
Protein function assignments based on postulated homology as recognized by high sequence similarity are used routinely in genome analysis. Improvements in sensitivity of sequence comparison algorithms got to the point, that proteins with previously undetectable sequence similarity, such as for instance 10-15% of identical residues, sometimes can be classified as similar. What is the relation between such proteins? Is it possible that they are homologous? What is the practical significance of detecting such similarities? A simplified analysis of the relation between sequence similarity and function similarity is presented here for the well-characterized proteins from the E. coli genome. Using a simple measure of functional similarity based on E.C. classification of enzymes, it is shown that it correlates well with sequence similarity measured by statistical significance of the alignment score. Proteins, similar by this standard, even in cases of low sequence identity, have a much larger chance of having similar function than the randomly chosen protein pairs. Interesting exceptions to these rules are discussed.
基于高序列相似性所识别的假定同源性进行蛋白质功能分配,在基因组分析中经常使用。序列比较算法灵敏度的提高达到了这样的程度,即具有先前无法检测到的序列相似性的蛋白质,例如10 - 15%的相同残基,有时可以被归类为相似。这些蛋白质之间有什么关系?它们有可能是同源的吗?检测到这种相似性的实际意义是什么?这里针对大肠杆菌基因组中特征明确的蛋白质,对序列相似性和功能相似性之间的关系进行了简化分析。使用基于酶的E.C.分类的简单功能相似性度量方法,结果表明它与通过比对分数的统计显著性测量的序列相似性密切相关。按照这个标准相似的蛋白质,即使在序列同一性较低的情况下,比起随机选择的蛋白质对,具有相似功能的可能性要大得多。文中讨论了这些规则的有趣例外情况。