Kann M, Qian B, Goldstein R A
Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109-1055, USA.
Proteins. 2000 Dec 1;41(4):498-503. doi: 10.1002/1097-0134(20001201)41:4<498::aid-prot70>3.0.co;2-3.
The growth in protein sequence data has placed a premium on ways to infer structure and function of the newly sequenced proteins. One of the most effective ways is to identify a homologous relationship with a protein about which more is known. While close evolutionary relationships can be confidently determined with standard methods, the difficulty increases as the relationships become more distant. All of these methods rely on some score function to measure sequence similarity. The choice of score function is especially critical for these distant relationships. We describe a new method of determining a score function, optimizing the ability to discriminate between homologs and non-homologs. We find that this new score function performs better than standard score functions for the identification of distant homologies.
蛋白质序列数据的增长使得推断新测序蛋白质的结构和功能的方法变得尤为重要。最有效的方法之一是识别与已知信息较多的蛋白质的同源关系。虽然通过标准方法可以可靠地确定密切的进化关系,但随着关系变得更加疏远,难度也会增加。所有这些方法都依赖于某种评分函数来衡量序列相似性。评分函数的选择对于这些遥远关系尤为关键。我们描述了一种确定评分函数的新方法,优化了区分同源物和非同源物的能力。我们发现,这种新的评分函数在识别遥远同源物方面比标准评分函数表现更好。